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3 ABSTRACT 

4 Data assimilation leads naturally to a Bayesian formulation in which the posterior probability 

5 distribution of the system state, given all the observations on a time window of interest, 

6 plays a central conceptual role. The aim of this paper is to use this Bayesian posterior 

7 probability distribution as a gold standard against which to evaluate various commonly used 

8 data assimilation algorithms. 

9 A key aspect of geophysical data assimilation is the high dimensionality and limited 

10 predictability of the computational model. We study the 2D Navier-Stokes equations in a 

11 periodic geometry, which has these features and yet is tractable for explicit and accurate com- 

12 putation of the posterior distribution by state-of-the-art statistical sampling techniques. The 

13 commonly used algorithms that we evaluate, as quantified by the relative error in reproduc- 

14 ing moments of the posterior, are 4DVAR and a variety of sequential filtering approximations 

15 based on 3DVAR and on extended and ensemble Kalman filters. 

16 The primary conclusions are that under the assumption of a well-defined posterior prob- 

17 ability distribution: (i) with appropriate parameter choices, approximate filters can perform 

18 well in reproducing the mean of the desired probability distribution; (ii) however they do 

19 not perform as well in reproducing the covariance; (iii) the error is compounded by the need 

20 to modify the covariance, in order to induce stability. Thus, filters can be a useful tool in 

21 predicting mean behavior, but should be viewed with caution as predictors of uncertainty. 

22 These conclusions are intrinsic to the algorithms when assumptions underlying them are not 

23 valid and will not change if the model complexity is increased. 



24 1. Introduction 

25 The positive impact of data assimilation schemes on numerical weather prediction (NWP) 

26 is unquestionable. Improvements in forecast skill over decades reflect not only the increased 

27 resolution of the computational model, but also the increasing volumes of data available, 

28 and the increasing sophistication of algorithms to incorporate this data. However, because 

29 of the huge scale of the computational model, many of the algorithms used for data assimila- 

30 tion employ approximations, based on both physical insight and computational expediency, 

31 whose effect can be hard to evaluate. The aim of this paper is to describe a method of 

32 evaluating some important aspects of data assimilation algorithms, by comparing them with 

33 a gold-standard: the Bayesian posterior probability distribution on the system state given 

34 observations. In so doing we will demonstrate that carefully chosen filters can perform 

35 well in predicting mean behaviour, but that they typically perform poorly when predicting 

36 uncertainty, such as covariance information. 

37 In typical operational conditions the observed data, model initial conditions, and model 

38 equations are all subject to uncertainty. Thus we take the perspective that the gold standard, 

39 which we wish to reproduce as accurately as possible, is the (Bayesian) posterior probability 

40 distribution of the system state (possibly including parameters) given the observations. For 

41 practical weather forecasting scenarios this is not computable. The two primary competing 

42 methodologies for data as s imilat ion that are computabl e, and hence a re implemented in 



practice, are filters 



Bennett 



(J2OO2I ) . We will compare 



Kalnayl (120031 ) and variational methods 

44 the (accurately computed, extremely expensive) Bayesian posterior distribution with the 

45 output of the (approximate, relatively cheap) filters and variational methods used in practice. 

46 Our underlying dynamical model is the 2D Navier-Stokes equations in a periodic setting. 

47 This provides a high dimensional dynamical system, which exhibits a range of complex 

48 behaviours, yet which is sufficiently small that the Bayesian posterior may be accurately 

49 computed by state-of-the-art statistical sampling in an off-line setting. 

50 The idea behind filtering is to update the posterior distribution of the system state 



51 sequentially at each observation time. This may be performed ex actly for 



52 subject to Gaussian noise, and is then known as the Kalman filter 



53 (jl99ll ). For nonlinear or non- Gaussian scenarios the particle filter 



KalmanI fll96nh: 



inear systems 



Doucet et al 



H 



arvey 



(I2OOII) may 



54 be used and provably 



55 particles is increased 



approximates the desi red probability distribution as the number of 



Bain and Crisc 



poorly in high dimensional systems 



n 



2008 



Ho wever in practice this method performs 



Snyder et al. 



mm and, whilst t 



lere is considerable 



57 resear c h activity aimed a t over coming this degenertation 



(120101 ) 



Bengtsson et al. 



van Leeuwen 



(120101); 



Chorin et al 



( I2OO3I ). it cannot currently be viewed as a practical tool within the 

59 context of geophysical data assimilation. In order to circumvent problems associated with 

60 the representation of high dimensional probability distributions some form of Gaussian ap- 

61 proximation is typically used to create practical filters. The oldest and simplest such option 

62 is to use a nonlinear generalization of the mean update in the Kalman filter, employing a 

63 constant prior covariance operator, o btained o ffline through knowledge coming from the un- 



64 derlying model and past observations 



Lorend ( 1l986h : this methodology is sometimes refered 



65 to as 3DVAR. More sophisticated approximate Gaussian filt ers arise from ei ther lineariz 



66 ing the dynamical model, yielding the extended Kalman filter 



67 ensemble statistics, leading to the ensemble Kalman filter 



Jazwinski 



Evensen et al. 



9701). o r utilizing 



(119941 ) 



Evensen 



68 (120031 ). Informat i on ab out the underlying local (in time) Lyapunov vectors, or bred vec- 



tors (see 



Kalnayl (120031 ) for discussion) can be used to guide further approximations that 



are made when implementing these methods in hig h dimensions. We w i 
i n the use of Fourier diagonal filters, introduced in 



Harlim and Maidal (120081 ) 



I also 



De interested 



Majda et al 



72 (|2010[ ). which approximate the dynamical model by a statistically equivalent linear dynam- 

73 ical system in a manner which enables the covariance operator to be mapped forward in 

74 closed form; in steady state the version we employ here reduces to a particular choice of 

75 3DVAR, based on clima tological statistics. A n overview of particle filtering for geophysical 



76 systems may be f ound in 



may be found in 



Van Leeuwen 



Arulampalam et al. 



(!2002[ ) 



(|2009f) and a quick introduction to sequential ffltering 



Whilst filtering updates the system state sequentially each time when a new observation 
becomes available variational methods attempt to incorporate data which is distributed over 
an entire time-interval. This may be viewed as an optimization problem where the objective 
function is to choose the initial state, and possibly forcing to the physical model, in order 
to best match the data over the speci fied time-window. As such it may be viewed as a 



PDE-constrained optimization problem 
ular c lass of regularized inverse problem 



Hinzeetal 



Vogell (120021 ): 



fl2008f). and more 



Tarantolal ( 120051 ) 



g enerally as a partic- 



Banks and Kunisch 



85 (119891 ). This approach is referred to as 4DVAR in the geophysi cal literature, when the 



86 optirn iz ation is performed over j ust th e initial state of the system 



(119871 ) 



also over forcing to the system 



Talagrand and Courtier 



Courtier and Talagrand (Il987l) and as w eak constraint 4DVAR when optimization is 



Zupanski 



(119971 ). 



From a Bayesian perspective, the solution to an inverse problem is statistical, rather than 
deterministic, and is hence significantly more challenging: regularization is imposed through 
viewing the unknown as a random variable, and the aim is to find the posterior probability 
distribution on the state of the system on a given time window, given the observations on 
that time window. With the current and growing capacity of computers it is becoming 
relevant a nd tractable to begin to explo re such approaches to inverse problems in differential 



95 equations 



Kaipio and Somersald (120051 ) . even though it is currently infeasible to do so for 



96 NWP. There has, however, been some limited study of the Bayesian approach to inverse 



97 pr oblems in fluid me chani cs using pat 



98 m 



Apte et al. 



( 120071 ): see 



1 integra l fo rmulations in continuo u s time as introduced 



Apte et al 



(l2008a 



b^ 



Quinn and Abarbanell ( 120101 ): 



Cotter et al 



(1201 if ) for further developments. We will build on the algorithmic experience contained in 
these papers here. For a recent overvie w of Bayesian methodology for inverse problems in 
differential equations, see 



Stuart 



( 120101 ). and fo r the Bayesian forrn ulation of a variety of 



Cotter et al. 



(|2009[ ). The key take home 



inverse problems arising in fluid mechanics see 

message of this body of work on Bayesian inverse problems is that it is often possible to 

compute the posterior distribution of state given noisy data with high degree of accuracy. 



albeit at great expense: the methodology could not be used online as a practical algorithm, 
but provides us with a gold-standard against which we can evaluate on-line approximate 
methods used in practice. 

There are several useful connections to make between the Bayesian posterior distribu- 
tion, filtering methods and variational methods all of which serve to highlight the fact that 
they are all attempting to represent related quantities. The first observation is that, in the 
linear Gaussian setting, if backward filtering is implemented on a given time window (this is 
known as smoothin g) after forward filtering, then the resulting mean is equivalent to 4DVAR 



Fisher et al. 



(120051 ). The second observation is that the Bayesian posterior distribution at 
the end of the time window, which is a non-Gaussian version of the Kalman smoothing 
distribution just described, is equal to the exact filtering distribution at that time, provided 
the filter is initialized with the same distribution as that chosen at the start of the time 
window for the Bayesian posterior model 



Stuart 



(120101 ). The third observation is that the 



118 4DVAR variational method corresponds to maxi mizing the 



and is known in this context as a MAP estimator 



Codbaej) 



Bayesian posterior distr i 



3ution 



Kaipio and Somersald (120051 ). 



120 Mor e generally, connections bet ween filtering and smoothing have been understood for some 



time 



Bryson and Frazierl ( 1l963l ). 



For the filtering and variational algorithms implemented in practice, these connections 
may be lost, or weakened, because of the approximations made to create tractable algorithms. 
Hence we attempt to evaluate these algorithms by their ability to reproduce moments of the 
Bayesian posterior distribution since this provides an unequivocal notion of a perfect solution, 
given a complete model description, including sources of error; we hence refer to it as the gold 
standard. We emphasize that we do not claim to present optimal implementations of any 
method except the gold standard MCMC. Nonetheless, the phenomena we observe and the 
conclusions we arrive at will not change qualitatively if the algorithms are optimized. They 
reflect inherent properties of the approximations used to create online algorithms useable in 
practical online scenarios. 



132 The ability of filters to track the signal in chaotic systems has been t he object of s" 



133 data assimilation communities for some time and we point to the paper 



t udy in 



Miller et al 



f ll994h 



134 as an early example of this work, confined to low dimensional systems, and to the more 



135 recent 



Carrassi et al. 



( 120081 ) for study of both low and high dimensional problems, and for 

136 further discussion of the relevant literature. As mentioned above, we develop our evaluation 

137 in the context of the 2D Navier Stokes equations in a periodic box. We work in parameter 

138 regimes in which at most O(IO^) Fourier modes are active. This model has several attractive 

139 features. For instance, it has a unique global attractor with a tunable parameter, the viscosity 

140 (or, equivalently the Reynolds number), which tunes betw een a one- dime nsional stable fixed 



TemamI ( 120011 ). As the dimension 



point and very high-dimensional strongly chaotic attractor 

of the attractor increases, many scales are present, as one would expect in a model of the 

atmosphere. By working with dimensions of size O(IO^) we have a model of signi ficantly 



higher dime nsion than the typical toy models that one encounters in the literature 



Lorenz 



(11996 . 



19631 ). Therefore, while the 2D Navier-Stokes equations do not model atmospheric 
dynamics, we expect the model to exhibit similar predictability issues as arise atmospheric 
models, and this fact, together their high dimensionaliy, makes them a useful model with 
which to study aspects of atmospheric data assimilation. However we do recognize the need 
for follow-up studies which investigate similar issues for models such as Lorenz-96, or quasi- 
geostrophic models, which can mimic or model the baroclinic instabilities which drive so 
much of atmospheric dynamics. 

The primary conclusions of our study are that: (i) with appropriate parameter choices, 
approximate filters can perform well in reproducing the mean of the desired probability 
distribution; (ii) however these filters typically perform poorly when attempting to reproduce 
information about covariance as the assumptions underlying them may not be valid (iii) this 
poor performance is compounded by the need to modify the filters, and their covariance in 
particular, in order to induce filter stability and avoid divergence. Thus, whilst filters can be 



a useful tool in predicting mean behaviour, they should be viewed with caution as predictors 



of uncertainty. These conclusions are intrinsic to the algorithms and will not change if the 
model is more complex, for example resulting from a smaller viscosity in our model. We 
reiterate that these conclusions are based on our assumption of well-defined initial prior, 
observational error, and hence Bayesian posterior distributions. Due to the computational 
cost of computing the latter we look only at one, initial, interval of observations, but upon 
our assumption, the accuracy over this first interval will limit accuracy on all subsequent 
intervals, and they will not become better. Under the reasonable assumption that the process 
has finite correlation time, the initial prior will be forgotten eventually and, in the present 
context, this effect would be explored by choosing different priors coming from approximation 
of the asymptotic distribution by some filtering algorithm and/or climatological statistics 
and testing the robustness of conclusions, and indeed of the filtering distribution itself, to 
changes in prior. The question of sensitivity of the results to choice of prior is not addressed 
here. We also restrict our attention here to the perfect model scenario. 

Many com parisons of variou s versi o ns of these rn e thods have been carried out recently. 



173 For example. 



Meng and Zhang! (I2nin[ l: 



Zhang et al. 



( I2OIOI ) compare EnKF forecast with 



3DVAR and 4DVAR(without updated covariance) in the Weather Research and Forecast- 
ing (WRF) model. In their real-data experiments, they conclude that EnKF and 4DVAR 
perform better with respect the Root Mean Square Error (RMSE), while the EnKF forecast 
performs better for longer lead times. This result is consistent with ours, although it could be 
explained by an improved approximation of the posterior distribution at each update time. 
Our results indicate 4DVAR could perform better here, as long as the approximate filtering 
distribution of 4DVAR with the propagated Hessian is used. Of course this is too expensive 
in practice and often a constant covariance is used; this will limit performance in reproduc- 
ing the statistical variation of the posterior filtering distri b ution for prior in the nex t cycle. 



This issue is addressed partially in 



Meng and Zhand f l2010h 



Zhang and Zhang] ( 120121 ) , where 



EnKF is coupled to 4DVAR and the covariance comes from the former, while the mean is 
updated by the latter, and the resulting algorithm outperforms either of the individual ones 



7 



186 in the RMSE sense. T wo fundamental c lasses of EnKFs were compared theoretically in the 



large ensemble limit in 



Lei et al 



(l2010[ l. and it was found that the stochastic version (the 

188 one we employ here) in which observations are perturbed is more robust to perturbations 

189 in the forecast distribution t han t he deterministic one. Another interesting comparison was 



190 undertaken in 



Hamill et al 



(120001 ) in which several ensemble filters alternative to EnKF in 



191 operational use are compa r ed wi th respect to RMSE as well as other diagnostics such as 



192 rank histograms lAndersonI (Il996l ). We note that over long times the RMSE values for the 

193 algorithms we consider are in the same vicinity as the errors between the estimators and the 

194 truth that we present at the single filtering time. 

195 The rest of the paper will be organized in the following sections. First, we introduce 

196 the model and inverse problem in section [21 then we describe the various methods used to 

197 (approximately) compute posterior smoothing and filtering distributions in section [31 Then 

198 we describe the results of the numerical simulations in two sections. The first, section [H 

199 explores the accuracy of the filters by comparison with the posterior distribution and the 

200 truth. The second, section [5l explains the manifestation of instability in the filters, describes 

201 how they are stabilized, and studies implications for accuracy. We provide a summary 

202 and conclusions in section [6l In the Appendix [7| we describe some details of the numerical 

203 methods. 



204 2. Statement of the Model 

205 In this section we describe the dynamical model, and the filtering and smoothing prob- 

206 lems which arise from assimilating data into that model. The discussion is framed prior to 

207 discretization. Details relating to numerical implementation may be found in the Appendix 



m 



209 a. Dynamical Model: Navier-Stokes Equation 

210 The dynamical model we will consider is the two-dimensional incompressible Navier- 

211 Stokes equation in a periodic box with side of length two. By projecting into the space 

212 of divergence-free velocity fields, this may be written as a dynamical equation for the 

213 divergence-free velocity field u with the form 

dzL 

214 — + uAu + F{u) = f, m(0) = uq. (1) 

(JjL 

215 Here A (known as the Stokes operator) models the dissipation and acts as a (negative) 

216 Laplacian on divergence free fields, F{u) the nonlinearity arising from the convective time- 

217 derivative and / the body force, all projected into divergence free functions. We also work 

218 with spatial mean-zero velocity fields a s, in per i odic g eometries, the mean evolves indepen- 



219 dently of the other Fourier modes. See 



TemamI (1200 ll ) for details concering the formulation 



220 of incompressible fluid mechanics in this notation. We let "H denote the space of square- 

221 integrable, periodic and mean-zero divergence-free functions on the box. In order that our 

222 results are self-contained apart from the particular choice of model considered, we define the 

223 map \1'(-; t) : "H — 7- H so that the solution of ([1]) satisfies 

224 u(t) = ^(no;t). (2) 

225 Equation ([T]) has a global attractor and the viscosity parameter u tunes between regimes 

226 in which the attractor is a single stationary point, through periodic, quasi-periodic, chaotic, 

227 and strongly chaotic (the last two being delicate to distinguish between). These regimes are 

228 characterized by an increasing number of positive Lyapunov exponents, and hence increas- 

229 ing dimension of the unstable manifold. In turn, this results in a system which becomes 

230 progressively less predictable. This tunability through all predictability regimes, coupled to 

231 the possibility of high dimensional effective dynamics which can arise for certain parameter 

232 regimes of the PDE, makes this a useful model with which to examine some of the issues 

233 inherent in atmospheric data assimilation. 

9 



234 b. Inverse Problem 

235 The basic inverse problem which underhes data assimilation is to estimate the state of 

236 the system, given the model dynamics for the state, together with noisy observations of 

237 the state. In our setting, since the model dynamics are deterministic, this ammounts to 

238 estimating the initial condition from noisy observations at later times. This is an ill-posed 

239 problem which we regularize by adopting a Bayesian approach to the problem, imposing 

240 a prior Gaussian random field assumption on the initial condition. Throughout it will be 

241 useful to define || ■ ||_b = ll-B"^ ■ || for any covariance operator B and we use this notation 

242 throughout the paper, in particular in the observation space, with B = T and in the initial 

243 condition space with B = Cq. 

244 Our prior regularization on the initial state is to assume 

245 uor^ fio= A/'(mo, Co). (3) 

246 The prior mean tuq is our best guess of the initial state, before data is aquired (background 

247 mean) and the prior covariance Cq (background covariance) regularizes this by allowing 

248 variability with specified magnitude at different length-scales. The prior covariance Cq : "H — )■ 

249 Ti is self-adjoint and positive, and is assumed to have summable eigenvalues, a condition 

250 which is necessary and sufficient for draws from this prior to be square integrable. 

251 Now we describe the noisy observations. We observe only the velocity field, and not the 

252 pressure. Let F : H — )■ H be self-adjoint, positive operators and let 

z/fc~Ar(M(tfc),r) (4) 

254 denote noisy observations of the state at time t^ = kh which, for simplicity of exposition 

255 only, we have assumed to be equally spaced. We assume independence of the observational 

256 noise: yk\uk is independent of yj\uj for all j ^ k; and the observational noise is assumed 

257 independent of the initial condition uq. 

258 For simplicity and following convention in the field, we will not distinguish notationally 

259 between the random variable and its realization, exept in the case of the truth, which will 

10 



260 be important to distinguish by n^ in subsequent sections in which it will be simulated and 

261 known. The inverse problem consists of estimating the posterior probability distribution of 

262 u[t), given noisy observations {yk}k=o^ with j < J. This is referred to as 

263 • Smoothing when t < tj] 

264 • Filtering when t = tj] 

265 • Predicting when t > tj. 

266 Under the assumption that the dynamical model is deterministic, the smoothing distribution 

267 at time t = can be mapped forward in time to give the exact filtering distribution, which in 

268 turn can be mapped forward in time to give the exact predicting distribution (and likewise 

269 the filtering distribution mapped backward, if the forward map admits an inverse, yields 

270 the smoothing distribution). If the forward map were linear, for instance in the case of the 

271 Stokes equation {F{u) = 0), then the posterior distribution would be Gaussian as well, and 

272 could be given in closed form via its mean and covariance. In the nonlinear case, however, 

273 the posterior cannot be summarized through a finite set of quantities such as mean and 

274 covariance and, in theory, requires infinitely many samples to represent. In the language of 

275 the previous section, as the dimension of the attractor increases with Reynolds number, the 

276 nonlinearity begins to dominate the equation, the dynamics become less predictable, and the 

277 inverse problem becomes more difficult. In particular, Gaussian approximations can become 

278 increasingly misleading. We will see that sufficient nonlinearity for these misleading effects 

279 can arise more than one way, via the dynamical model or the observational frequency. 

280 1) Smoothing 

281 We start by describing the Bayesian posterior distribution, and link this to variational 

282 methods. Let u^ = u{kh), \&(n) = \I^(u;/i), and \&'^(-) = \&(-;A;/i). Furthermore, define the 
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283 conditional measures for ji,J2 < J 

285 (For notational convenience we do not distinguish between a probability distribution and its 

286 density, using fi and P intercliangably for both). The posterior distributions are completely 

287 characterized by the dynamical model in Eq. (|2]) and by the random inputs given in Eq. @ 

288 and Eq. ([3]). 

We focus on the posterior distribution /io| j since this probability distribution, once known, 
determines fij\j for all J > j > simply by using ([2]) to map the probability distribution at 
time t = into that arising at any later time t > 0. Bayes' rule gives a characterization of 
/Xo|j via the ratio of its density with respect to that of the prior q 

239 so that 

/^0|j(m) r ^, s. 

290 — ——— oc expj— <P(n)|, 

/io(u) 

291 where 

292 ^(u) 



l(j2\\yk-nu)\\i]. 

\fc=0 / 



293 The constant of proportionality is independent of u and irrelevant for the algorithms that 

294 we use below to probe the probability distribution /io|j- Note that here, and in what follows, 

295 u denotes the random variable uq. 

296 Using the fact that the prior fiQ is Gaussian it follows that the maximum a posteriori 

297 (MAP) estimator of fio\j is the minimizer of the functional 

I{u) = ^u) + -\\u-mo\\l^. (5) 



^ Note that our observations include data at time t ~ 0. Because the prior is Gaussian and the observa- 
tional noise is Gaussian we could alternatively redefine the prior to incorporate this data point, which can 
be done in closed form, and redefine the prior; the observations would then start at time t = h. 
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299 We let rfiQ = argmin^ -^(^)) that is rfiQ returns the value of u at which I{u) achieves its mini- 

300 mum. This so-called MAP estimator is, of course, simply the solution of the 4DVAR strong 

301 constraint variational method. The mathematical formulation of various inverse problems 

302 for the Navi er-Stokes equations , justifying the formal manipulations in this subsection, may 



303 be found in 



Cotter et al 



(120091) 



304 2) Filtering 

305 The posterior filtering distribution at time j given all observations up to time j can also 

306 be given in closed form by an application of Bayes' rule. The prior is taken as the predicting 

307 distribution: 



/^j|i-i(«i) = / PK>j-i)/^j-i|i-i(^Wi-i) (6) 



H 



6{uj - '^{uj_i))fij_iij_i{duj_i). 

H 

308 The 6 function appears because the dynamical model is deterministic. As we did for smooth- 

309 ing, we can apply Bayes rule to obtain the ratio of the density of fij^j with respect to fij\j-i 

310 to obtain 

^'^'^""^ ocexp{-<|.,(«)}, (7) 



'^i(w) = oll^J-^llr- 



312 where 

1 

2' 

314 Together ([6]) and ([7]) provide an iteration which, at the final observation time, yields 

315 the measure /ij|j. As mentioned in the introduction, this distribution can be obtained by 
315 evolving the posterior smoothing distribution /io|j forward in time under the dynamics given 
317 by ©. 
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318 3. Overview of Methods 

319 In this section, we provide details of the various computational methods we use to obtain 

320 information about the probability distribution on the state of the system, given observa- 

321 tions, in both the smoothing and filtering contexts. To approximate the gold standard, the 

322 Bayesian posterior distribution, we use state-of-the-art Markov chain Monte Carlo (MCMC) 

323 sampling for the inverse problem, to obtain a large number of samples from the posterior 

324 distribution, sufficient to represent its mode and the posterior spread around it. We also 

325 decribe optimization techniques to compute the MAP estimator of the posterior density, 

326 namely 4DVAR. Both the Bayesian posterior sampling and 4DVAR are based on obtaining 

327 information from the smoothing distribtion from subsection [1] Then we describe a vari- 

328 ety of filters, all building on the description of sequential filtering distributions introduced 

329 in subsection [2], using Gaussian approximations of one form or another. These filters are 

330 3DVAR, the Fourier Diagonal Filter, the Extended Kalman filter, and the Ensemble Kalman 

331 filter. We will refer to these filtering algorithms collectively as approximate Gaussian filters 

332 to highlight the fact that they are all derived by imposing a Gaussian approximation in the 

333 prediction step. 

334 a. MCMC Sampling of the Posterior 

335 We work in the setting of the Metropolis-Hastings variant of MCMC methods, employ- 



336 ing recently 



Cotter et al. 



developed methods which scale well with respect to system dimension; see 



( 2011 ) for further details and references. The resulting random walk method 

n 

338 that we use to sample from /io|j is given as follow^l: 

339 • Draw u^^^ ~ J\f{mo,Co) and set n = I. 



Define m* = ^/l - /32m(""1) + (1 - ^/l - (3^) 



mo. 



.p. denotes "with probability" 
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• Draw 

341 • Let a^""-^) = min< l,exp($(n("~^)) - $(u*)) i and set 

u* w.p. a("~^) 
I m('^-i) e/se. 

342 • n h^ n + 1 and repeat. 

343 After a burn-in period of M steps, {m^"''}^=a/ ~ Ato|j- This sample is then pushed forward 

344 to yield a sample of time- dependent solutions, {^'•"^(t)}, where u^'^\t) = \l/(M("^;t), or in 

345 particular in what follows, a sample of the filtering distribution {\1/'^m("^}. 

346 b. Variational Methods: 4DVAR 

347 As described in section [21 the minimizer of I defined in Eq. ([5]) defines the 4DVAR 

348 approximation, the basic variational method. A variety of optimization routines can be 

349 used to solve this problem. We have found Newton's method to be effective, with an initial 

350 starting point computed by homotopy methods starting from an easily computable problem. 

We now outline how the 4DVAR solution may be used to generate an approximation to 
the distribution of interest. The 4DVAR solution (MAP estimator) coincides with the mean 
for unimodal symmetric distributions. If the variance under yUo|j is small then it is natural 
to seek a Gaussian approximation. This has the form A/'(mo, Co) where 

Co' = D^I{mo) = D^^rho) + C^\ 

Here D^ denotes the second derivative operator. This Gaussian on the initial condition uq 
can be mapped forward under the dynamics, using linearization for the covariance, since it 
is assumed small, to obtain u{t) ^ J\f(jh(t),C{t)^ where m(t) = \l/(mo;t) and 

C{t) = D^{mo]t)CoD<iJ{mo;ty. 

351 Here D denotes the derivative operator, and * the adjoint. 
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352 c. Approximate Gaussian Filters 

353 Recall the key update formulae (El), (JTj). Note that the integrals are over the function space 

354 Ti, a. fact which points to the extreme computational complexity of characterizing probability 

355 distributions for problems arising from PDEs or their high dimensional approximation. We 

356 will describe various approximations, which are all Gaussian in nature, and make the update 

357 formulae tractable. We describe some generalities relating to this issue, before describing 
368 various method dependent specifics in following subsections. 

359 If "$ is nonlinear then /ij_i|j„i Gaussian does not imply fij\j-i is Gaussian; this follows 

360 from ([6]). Thus prediction cannot be performed simply by mapping mean and covariance. 

361 However, the update equation (jTj) has the property that, if fij\j~i is Gaussian then so is fij\j. 

362 If we assume that fij\j^i = Nirrij, Cj), then (jTj) shows that fij\j is Gaussian Afifhj, Cj) where 

363 rfij is the MAP estimator given by 

364 rhj = ai grain I j{u), (9) 

u 

(so that rhj minimizes Ij{u)) and 



Ij{u) = ^,j{u) + -\\u-mj\\l 



365 Note that, using ([8]), we see that Ij is a quadratic form whose minimizer is given in closed 

366 form as the solution of a linear equation with the form 

367 rhj = CjiCj^nij + T'^yj \ (10) 

368 where 

CJ^ = CJ^ + V~\ (11) 

If the output of the prediction step given by ([6]) is approximated by a Gaussian then this 
provides the basis for a sequential Gaussian approximation method. To be precise, if we 
have that 

^^i~l\i-l = A/'(mj„i,Cj_i) 
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370 and we have formulae, based on an approximation of ([6]), which enable us to compute the 

371 map 

372 (mj_.i,Cj_i) i-T- (mj,Cj) (12) 

373 then together ( TTOl) . ( TTTl) . (IT2i) provide an iteration for Gaussian approximations of the filtering 

374 distribution fijy of the form 

375 (mj_i,Cj_i) I— i- (mj,Cj). 

376 In the next few subsections we explain a variety of such approximations, and the resulting 

377 filters. 

378 1) Constant Gaussian filter (3DVAR) 

379 The constant Gaussian filter, referred to as 3DVAR, consists of making the choices rrij = 

380 \E'(mj„i) and Cj = C in (fT2|) . It is natural, theoretically, to choose C = Cq the prior covariance 

381 on the initial condition. However, as we will see, other issues may intervene and suggest or 

382 necessitate other choices. 

383 2) Fourier Diagonal Filter (FDF) 

384 A first step beyond 3DVAR, which employs constant covariances when updating to incor- 

385 porate new data, is to use some approxi r nate d ynamics in order to make the update ( fT2l) . In 



Harlim and Maidal fl2008[ ) 



Majda et al. 



( 120101 ) it is demonstrated that, in regimes exhibiting 



387 chaotic dynamics, linear stochastic models can be quite effective for this purpose: this is the 

388 idea of the Fourier Diagonal Filter. In this subsection we describe how this idea may be used, 

389 in both the steady and trubulent regir aes of the Navier- Stokes s ystem under consideration. 

390 For our purposes, and as observed in iHarlim and Majdal (120081 ) . this approach provides a 



391 rational way of deriving the covariances in 3DVAR, based on climatological statistics. 

392 The basic idea is, for the purposes of filtering, to replace the nonlinear map Uj+i = "^{uj] 
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393 by the linear (stochastic when Q 7^ 0) map 

394 Uj+i = Luj + y/Q^j- (13) 

395 Here it is assumed that L is negative definite and diagonal in the Fourier basis, Q has 

396 summable eigenvalues and is diagonal in the Fourier basis and ^j is a random noise chosen 

397 from the distribution M{0, 1). More sophisticated linear stochastic models could (and should) 

398 be used, but we employ this simplest of models to convey our ideas. 

399 If L = exp(— M/i) and Q = [I — exp(— 2M/i)]S, then (fT3|) corresponds to the discrete 

400 time h solution of the Ornstein-Uhlenbeck (OU) process 



401 du + Mudt = V2MEdW, 

402 where dW is the infinitesimal Brownian motion increment with identity covariance. The 

403 stationary solution is A/'(0, S) and letting Mk^k = o^k, the correlation time for mode k can 

404 be computed as l/a^. We employ three models of the form (1131) in this paper, labelled a), 

405 b) and c), and detailed below. Before turning to them, we describe how this linear model is 

406 incorporated into the filter. 

In the case of linear dynamics such as these, the map flT2|) is given in closed form 

ruj = Lrhj_i, Cj = LCj^iL* + Q. 

407 This can be improved, however, in the spirit of 3DVAR, by updating only the covariance in 

408 this way and mapping the mean under the nonlinear map, to obtain the following instance 

409 of (112]): 



m 



'J 



^(mj_i), Cj = LCj^iL* + Q. 



411 We implement the method in this form. We note that, because L is negative-definite, the 

412 covariance Cj converges to some Coo which can be computed explicitly, and, asymptotically, 

413 the algorithm behaves like 3DVAR with a systematic choice of covariance. We now turn to 

414 the choices of L and Q. 
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415 Model (a) is used in the stationary regime. It is found by setting L = exp{—uAh) 

416 and taking Q = el where e = 10"^^. Although this does not correspond to an accurate 

417 linearization of the model in low wave numbers, it is reasonable for high wave numbers. 
Model (b) is used in the strongly c haoti c regime, and is based on the original idea in 



Harhm and Majdal (120081 ) 



Majda et al. 



(120 lOl ). The two quantities Sk^k and ak are matched 



to the statistics of the dynamical model, as follows. Let u{t) denote the solution to the 
Navier-Stokes equation ([T]) which, abusing notation, we assume to be represented in the 
Fourier domain, with entries Uk(t). Then u and S are given by the formulae 



U 



1 [^ 

lim — / u(t)dt, 



lim — 



(n(t) - n) ® (n(t) - n)*dt. 







418 In practice these integrals are approximated by finite discrete sums. Furthermore, we set 

419 the off-diagonal entries of H to zero to obtain a diagonal model. We set o"| = r^k^k- Then the 

420 ak are computed using the formulae 



M{t,T) = {u{t - t) - u) (S) {u{t) - u)* 
Corrfc(r) 



limT^oo^/o Mk,k{t,T)dt 
o°°Re(Corr^(r))cir 



421 Again, finite discrete sums are used to approximate the integrals. 

422 3) Low Rank Extended Kalman Filter (LRExKF) 

423 The idea of the extended Kalman filter is to assume that the desired distributions are 

424 approximately Gaussian with small covariance. Then linearization may be used to show that 
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425 a natural approximation of f lT2|) is the map 



a 



m,- 



^(mj_i), Cj = D^!{rhj^i)Cj^iD'^{rhj^ 



(14) 



427 Updating the covariance this way requires one forward tangent hnear solve and one adjoint 

428 solve for each dimension of the system, and is therefore prohibitively expensive for high 

429 dimensional problems. To overcome this we use a low rank approximation to the covariance 

430 update. 

We write this explicitly as follows. Compute the dominant m eigenpairs of Cj as defined 
in Eq. (fT^ : these satisfy 

431 Define the rank m matrix M. = VAV* and note that this captures the essence of the 

432 covariance implied by the extended Kalman filter, in the directions of the m dominant 

433 eigenpairs. When the eigenvalues are well-separated, as they are here, a small number of 

434 eigenvalues capture the majority of the action and this is very efficient. We then implement 
436 the filter 

436 ruj = '^{mj_i), Cj = M + el (15) 

437 where e = 10~^^ as above. The perturbation term prevents degeneracy. 

438 The notion of keeping track of the unstable directions of the dynamical model is not new, 

439 although our pa rticular implementation d i ffers in some d e tails. For discu s sions a nd examples 



of this idea see 



fl2003h 



Toth and Kalnav 



Auvinen et al. 



fl2009h . and 



1997) 



Palmer et al 



Hamill et al 



fl2000h . 



fll998[ ). 



Kalnavl fl2003[ ) 



Leutbecher 



■^As an aside, we note a more sophisticated improved version we have not seen yet in the hterature would 
include the higher-order drift term involving the Hessian. Although adding significant expense there could 
be scenarios in which this is worthwhile to attempt this. 
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442 4) Ensemble Kalman Filter (EnKF) 



The Ensemble K alman Filter, introduced in lEvensen et al.l ( 1l994l ) and overviewed in 



Evensenl ( 12003 



20091 ) , is slightly outsid e the framework of the previous three filters and there 



445 are many versions (see 



Lei et al. 



(I2OIOI ) for a comparison between two major categories.) This 

446 is because the basic object which is updated is an ensemble of particles, not a mean and 

447 covariance. This ensemble is used to compute an empirical mean and convariance. We 

448 describe how the basic building blocks of approximate Gaussian filters, namely ( fTOl) . ( fTTl) 

449 and (TT2l) . are modified to use ensemble statistics. 

We start with (fT2l) . Assuming one has an ensemble {rhj^i} ~ A/'(mj_i,Cj_i), (fT2l) is 



replaced by the approximations 



m 



(n) 



(n) 



^(m%) 



m,- = — y 



n=l 



(n) 



and 



1 ^ 



(16) 



n=l 



452 The equation (jTOl) is approximated via an ensemble of equations found by replacing rrij by 



453 m 



(n) 



{«)l 



and replacing yj by independent draws {Vj} from Af{yj,r). This leads to updates of 

454 the ensemble members rrij h-)> rhj' whose sample mean yields ttij. For infinite particles, 

455 the sample covariance yields Cj. In the comparisons we consider the covariance to be the 

456 analytical one Cj = (/ — Cj^i{Cj^i + r)~^)Cj_i as in ( fTTl) . rather than the ensemble sample 

457 covariance, which yields the one implicitly in the next update ( 1T2|) . The discrepancy between 

458 these can be large for small samples and in different situations it may have either a positive 

459 or negative effect on the filter divergence discussed in Section [51 Solutions of the ensemble of 

460 equations of form (fTOj) are implemented in the standard Kalman filter fashion; this does not 

461 involve computing the inverse covariances which appear in (ITT]) . There are many variants 



462 on the EnKF and we h ave simp 



Tippett et al. 



(120031) and 



Evensen 



chos en one representative version. See, for example. 



(120091). 
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464 4. Filter Accuracy 

465 In this section we describe various aspects of the accuracy of both variational methods 

466 (4DVAR) and approximate Gaussian filters, evaluating them with respect to their effective- 

467 ness in reproducing the following two quantities: (i) the posterior distribution on state given 

468 observations; (ii) the truth u^ which gives rise to the observations. The first of these is found 

469 by means of accurate MCMC simulations, and is then characterized by three quantities: its 

470 mean, variance, and MAP estimator. It is our contention that, where quantification of un- 

471 certainty is important, the comparison of algorithms by their ability to predict (i) is central; 

472 however many algorithms are benchmarked in the literature by their ability to predict the 

473 truth (ii) and so we also include this information. A comparison of the algorithms with (iii) 

474 the observational data is also included as a useful check on the performance of the algorithms. 

475 Note that studying the error in (i) requires comparison of probability distributions; we do 

476 this primarily through comparison of mean and covariance information. In all our simula- 

477 tions the posterior distribution, and the distributions implied by the variational and filtering 

478 algorithms, are approximately Gaussian; for this reason studying the mean and covariance 

479 is sufficient. We note that we have not included model error in our study: uncertainty in 

480 the dynamical model comes only through the initial condition; thus attempting to match 

481 the "truth" is not unnatural in our setting. Matching the posterior distribution is, however, 

482 arguably more natural and is a concept which generalizes in a straightforward fashion to 

483 the inclusion of model error. In this section all methods are presented in their "raw" form, 

484 unmodified and not optimized. Modifications that are often used in practice are discussed 

485 in the next section. 

486 a. Nature of Approximations 

487 In this preliminary discussion we make three observations which help to guide and un- 

488 derstand subsequent numerical experiments. For the purposes of this discussion we assume 
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489 that the MCMC method, our gold standard, provides exact samples from the desired poste- 

490 rior distribution. There are then two key approximations underlying the methods which we 

491 benchmark against MCMC in this section. The first is the Gaussian approximation, which 

492 is made in 3DVAR/FDF, 4DVAR (when propagating from t = to t = T), LRExKF and 

493 EnKF; the second additional approximation is sampling, which is made only in EnKF. The 

494 3DVAR and FDF methods make a universal, steady approximation to the covariance whilst 

495 4DVAR, LRExKF and EnKF all propagate the approximate covariance using the dynamical 

496 model. Our first observation is thus that we expect 3DVAR and FDF to underperform the 

497 other methods with regard to covariance information. The second observation arises from 

498 the following: the predicting (and hence smoothing and filtering) distribution will remain 

499 close to Gaussian as long as there is a balance between dynamics remaining close to linear 

500 and the covariance being small enough (i.e. there is an appropriate level of either of these 

501 factors which can counteract any instance of the other one). In this case the evolution of 

502 the distribution is well approximated to leading order by the non-autonomous linear system 

503 update of ExKF, and similarly for the 4DVAR update from t = to t = T. Our second 

504 observation is hence that the bias in the Gaussian approximation will become significant if 

505 the dynamics is sufficiently non-linear or if the covariance becomes large enough. These two 

506 factors which destroy the Gaussian approximation will be more pronounced as the Reynolds 

507 number increases, leading to more, and larger, growing (local) Lyapunov exponents, and as 

508 the time interval between observations increases, allowing further growth or, for 4DVAR, 

509 as the total time-interval grows. The third and final observation concerns EnKF methods. 

510 In addition to making the Gaussian approximation, these rely on sampling to capture the 

511 resulting Gaussian. Hence the error in the EnKF methods will become significant if the 

512 number of samples is too small, even when the Gaussian approximation is valid. Further- 

513 more, since the number of samples required tends to grow with both dimension and with 

514 the inverse of the size of the quantity being measured, we expect that EnKF will encounter 

515 difficulties in this high dimensional system which will be exacerbated when the covariance 
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516 is small. 

517 We will show in the following that in the stationary case, and for high frequency obser- 

518 vations in the strongly chaotic case, the ExKF does perform well because of an appropriate 

519 balance of the level of nonlinearity of the dynamics on the scale of the time between obser- 

520 vations and the magnitude of the covariance. Nonetheless, a reasonable sized ensemble in 

521 the EnKF is not sufficiently large for the error from that algorithm to be dominated by the 

522 ExKF error, and it is instead determined by the error in the sample statistics with which 

523 EnKF approximates t he mean and covaria nce; this latter effect was demonstrated on a sim- 



524 pier model problem in lApte et al.l (l2008bl ). When the observations are sufficiently sparse in 

525 time in the strongly chaotic case the Gaussian approximation is no longer valid and even the 

526 ExKF fails to recover accurate mean and covariance. 

527 b. Illustration via Two Regimes 

528 This section is divided into two subsections, each devoted to a dynamical regime: sta- 

529 tionary, and strongly chaotic. The true initial condition u^ in the case of strongly chaotic 

530 dynamics is taken as an arbitrary point on the attractor obtained by simulating an arbitrary 

531 initial condition until statistical equilibrium. The initial condition for the case of stationary 

532 dynamics is taken as a draw from the Gaussian prior, since the statistical equilibrium is the 

533 trivial one. Note that in the stationary dynamical regime the equation is dominated by the 

534 linear term and hence this regime acts as a benchmark for the approximate Kalman filters, 

535 since they are exact in the linear case. Each of these sections in turn explores the particular 

536 characteristics of the filter accuracy inherent to that regime as a function of time between 

537 observations, h. The final time, T, will mostly be fixed, so that decreasing h will increase 

538 the density of observations of the system on a fixed time domain; however, on several occa- 

539 sions we study the effect of fixing h and changing the final time T. Studies of the effect on 

540 the posterior distribution of increasing the nu mber of observations are undertaken for some 



541 simple inverse problems in fluid mechanics in ICotter et al. 
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( 1201 ll ) and are not undertaken 



542 here. 

We now explain the basic format of the tables which follow and indicate the major 
features of the filters that they exhibit. The first 8 rows each correspond to a method of 
assimilation, while the final two rows correspond to the truth, at the start and end of the 
time window studied, for completeness. Labels for these rows are given in the far left column. 
The posterior distribution (MCMC) and MAP estimator (4DVAR) are each obtained via the 
smoothing distribution, and hence comparson is made at the intial time, t = 0, and at the 
final time, t = T, by mapping forward. For all other methods, the comparison is only with 
the filtering distribution at the final time, t = T. The columns each indicate the relative 
error of the given filter with a particular diagnostic quantity of interest. The first, third, 
fourth and fifth columns show e = \\M{t) — ?7i(t)||/||M(t)||, where M is, respectively, the 
mean of the posterior distribution found by MCMC and denoted E,u(t), the truth u^{t), the 
observation y(t), or the MAP estimator (4DVAR) at time t (either or T) and ?7i(t) is the 
time t mean of the filtering (or smoothing) distribution obtained from each of the various 
methods. The norm used is the L^([— 1, 1) x [—1, 1)) norm. The second column shows 

||var(n(t)) - var(f/(t)) || 
||var(n(t)) || 

643 where var indicates the variance, u is sampled from the posterior distribution (via MCMC), 

544 and U is the Gaussian approximate state obtained from the various methods. The subscripts 

545 in the titles in the top row indicate which relative error is given in that column. 

546 The following universal observations can be made independent of model parametric 

547 regime. 

548 • The numerical results support the three observations made in the previous subsection. 

549 • Most algorithms do a reasonably god job of reproducing the mean of the posterior 

550 distribution. 

551 • The LRExKF and 4DVAR both do a reasonably good job of reproducing the variance 

552 of the posterior distribution if the Reynolds number is sufficiently small and/or the 
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553 observation frequency high; otherwise there are circumstances where the approxima- 

554 tions underlying the ad hoc filters are not justified and they then fail to reproduce 

555 covariance information with any accuracy. 

556 • All other algorithms perform poorly when reproducing the variance of the posterior 
567 distribution. 

558 • All estimators of the mean are uniformly closer to the truth than the observations for 

559 all h. 

560 • In almost all cases, the estimators of the mean are closer to the mean of the posterior 

561 distribution than to the truth. 

562 • The error of the estimators of the mean with respect to the truth tends to increase 

563 with increasing h. 

564 • The error of the mean with respect to the truth decreases for increasing number of 

565 observations. 

566 • LRExKF usually has the smallest error with respect to the posterior mean and some- 

567 times accurately recovers the variance. 

568 • The error in the variance is sometimes overestimated and sometimes underestimated, 

569 and usually this is wavenumber-dependent in the sense that the variance of certain 

570 modes is overestimated and the variance of others is under-estimated. This will be 

571 discussed further in the next section. 

572 • The posterior smoothing distribution becomes noticeably non-Gaussian although still 

573 unimodal, while the filtering distribution remains very close to Gaussian. 
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574 c. Stationary Regime 

575 In the stationary regime, z/ = 0.1, the basic time-step used is dt = 0.05, the smallest h 

576 considered is h = 0.2, and we fix T = 2 as the filtering time at which to make comparisons 

577 of the approximate filters with the moments of the posterior distribution via samples from 

578 MCMC, the MAP estimator from 4DVAR, the truth, and the observations. Figure [U shows 

579 the vorticity, w (left), and Fourier coefficients, \uk\ (right), of the smoothing distribution at 

580 t = in the case that h = 0.2. The top panels are the mean of the posterior distribution 

581 found with MCMC, (Em), and the bottom panels are the truth, m^(0). The MAP estimator 

582 (minimizer of I{u), tHq = argmin J) is not shown because it is not discernable from the mean 

583 in this case. Notice that the mean (and MAP estimator) on the initial condition resemble 

584 the large-scale structure of the truth, but are more rough. This roughness is caused by 

585 the presence of the prior mean tuq drawn according to the distribution A/'(m^(0),Co). The 

586 solution operator \1/ immediately removes this roughness as it damps high wavenumbers; this 

587 effect can be seen in the images of the smoothing distribution mapped forward to time t = T, 

588 i.e. the filtering distribution, in Figure [2] ( here only the mean is shown, as neither the truth 

589 nor the MAP estimator are distinguishable from it). This is apparent in the data in the 

590 tables discussed below, in which the distance between the truth, the posterior distribution, 

591 and the MAP estimator are all mutually much closer for the final time than the initial; 

592 this contraction of the errors in time is caused by the underlying dynamics which involves 

593 exponential attraction to a unique stationary state. This is further exihibited in Figure [3] 

594 which shows the histogram of the smoothing distribution for the real part of a sample mode, 

595 Ui 1, at the initial time (left) and final time (right). 

596 Table [1] presents data for increasing h = 0.2,1,2, with T = 2 fixed. Notable trends, 

597 in addition to those mentioned at the start of this section, are as follows: (i) the 4DVAR 

598 smoothing distribution has much smaller error with respect to the mean at t = T than at 

599 t = 0, with the former increasing and the latter decreasing for increasing h; the error of 

600 4DVAR with respect to the mean and the variance at t = and t = T are close to or below 
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601 the threshold of accuracy of MCMC; (iii) the error of both the mean and the variance of 

602 3DVAR tend to decrease with increasing h; 

603 d. Strongly Chaotic Regime 

604 In the strongly chaotic regime, z/ = 0.01, the basic time-step used is dt = 0.005, the 

605 smallest h considered is h = 0.02, and we fix T = 0.2 or T = 1 as the filtering time at which 

606 to make comparisons of the approximate filters. In this regime, the dynamics are significantly 

607 more nonlinear and less predictable, with a high-dimensional attractor spanning many scales. 

608 Indeed the energy spectrum decays like E(k) = liras^o Jq^ Ji. E,\ui\'^ldld9 oc k""^/^ for 

609 \k\ < kf, with kf the magnitude of the forcing wavenumber, and much more rapidly for 

610 \k\ > kf. See the left panel of Figure H] for the average spectrum of the solution on the 

611 attractor and Fig. [5] for an example snapshot of the solution on the attractor. The flow 

612 is not in any of the classical regimes of cascades, but there is an upscale transfer of energy 

613 because of the forcing at intermediate scale. The viscosity is not negligible even at the largest 

614 scales, thereby allowing statistical equilibrium; this may be thought of as being generated 

615 by the empirical measure on the global attractor whose existence is assured for all z/ > 0. 

616 We confirmed this with simulations to times of order O^lO^u) giving O(IO^) samples with 

617 which to compute the converged correlation statistics used in FDF. 

618 Small perturbations in the directions of maximal growth of the dynamics grow substan- 

619 tially over the larger times between observations we look at, while over the shorter times the 

620 dynamics remain well approximated by the linearization. See the right panel of Figure H] for 

621 an example of the local maximal growth of perturbations. Figure [5] shows the initial and final 

622 time profiles of the mean as in Figures [1] and |2l Now that the solutions themselves are more 

623 rough, it is not possible to notice the influence of the prior mean at t = 0, and the profiles 

624 of the truth and MAP are indistinguishable from the mean throughout the interval of time. 

625 The situation in this regime is significantly different from the situation close to a stationary 

626 solution, primarily because the dimension of the attractor is very large and the dynamics on 
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627 it are very unpredictable. Notice in Figure [6] (top) that the uncertainty in un now barely 

628 decreases as we pass from initial time t = to final time t = T. Indeed for moderately high 

629 modes, the uncertainty increases (see |6] (bottom) for the distribution of U55). 

630 Table [2] presents data for increasing h = 0.02,0.1,0.2, with T = 0.2 fixed. Table |3] 

631 shows data for increasing h = 0.2, 0.5 with T = 1 fixed. Notable trends, in addition to 

632 those mentioned at the start of the section, are: (i) when computable, the variance of the 

633 4DVAR smoothing distribution has smaller error at t = than at t = T; (ii) the 4DVAR 

634 smoothing distribution error with respect to the variance cannot be computed accurately 

635 for T = 1 because of accumulated error for long times in the aproximation of the adjoint of 

636 the forward operator by the discretization of the analytical adjoint; (iii) the error of 4DVAR 

637 with respect to the mean at t = for h < 0.1 is below the threshold of accuracy of MCMC; 

638 (iv) the error in the variance for the FDF algorithm is very large because the Q is an order 

639 of magnitude larger than F; (v) the FDF algorithm is consistent in recovering the mean for 

640 increasing h, while the other algorithms deteriorate; (vi) the error of FDF with respect to 

641 the variance decreases with increasing h; (vii) for h = 0.5 and T = 1 the FDF performs best 

642 and these desirable properties of the FDF variant on 3DVAR are associated with stability 

643 and will be discussed in the next section; (viii) for increasing h, the error in the mean of 

644 LRExKF increases first when h = 0.1 and T = 0.2 and becomes close to the error in the 

645 variance which can be explained by the bias induced by neglecting the next order of the 

646 expansion of the dynamics; (ix) the error in LRExKF is substantial when T = 1 and it really 

647 majorly fails when h = 0.5 which is consistent with the time-scale on which nonlinear effects 

648 become prominent (see Fig. Hj) and the linear approximation would not be expected to be 

649 valid. The error in the mean is larger, again as expected from the Ito correction term. 
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660 5. Filter Stability 

651 Many of the accuracy results for the filters described in the previous section are degraded 

652 if, as is common practice in applied scenarios, modifications are made to ensure that the 

653 algorithms remain stable over longer time-intervals; that is if some form of variance inflation 



654 is performed to keep the algori t hm c 



655 filter divergence (see 



Jazwinskil (jl970[ l 



o se to the tru e signa l, or to pr e vent it from suffering 



Fisher et al 



(120051) 



Evensen 



( 120091 ). and references 



656 therein). In this section we describe some of the mathematics which underlies stabilization, 

657 describe numerical results illustrating it, and investigate its effect on filter accuracy. The 

658 basic conclusion of this section is that stabilization via variance inflation enables algorithms 
669 to be run for longer time windows before diverging, but may cause poorer accuracy in both the 

660 mean (before divergence) and the variance predictions. Again, we make no claims of optimal 

661 implementation of these filters, but rather aim to describe the mechanism of stabilization 

662 and the common effect, in general, as measured by ability to reproduce the gold standard 

663 posterior distribution. 

664 We define stability in this context to mean that the distance between the truth and the 

665 estimated mean remains small. As we will demonstrate, whether or not this distance remains 

666 small depends on whether the observations made are sufficient to control any instabilities 

667 inherent in the model dynamics. To understand this issue it is instructive to consider the 

668 3DVAR, FDF and LRExKF filters, all of which use a prediction step ( fT2l) which updates the 

669 mean using rrij = \E'(mj_i). When combined with the data incorporation step fITOl) we get 

670 an update equation of the form 



m 



i+i 



(/ - Kj)^{mj) + Kjyj+^, 



(17) 



where Kj = (C,:^ + T ^) T ^ is the Kalman gain matrix. If we assume that the data is 
derived from a true signal Uj satisfying Uj^^ = '^{Uj) and that 

Vj+i = m]+i + Vj = ^(m]) + Vj, 
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672 where the rjj denote the observation errors, then we see that flT7|l has the form 

673 rhj+i = {I - Kj)'${mj) + Kj'^iu]) + KjTJj+i. (18) 

674 If the observational noise is assumed to be consistent with the model used for the assimilation, 

675 then rjj ~ A/'(0, F) are i.i.d. random variables and we note that (TT8|) is an inhomogenous 

676 Markov chain. 

677 Note that 

«]+i = {!- ^,)^(^) + ^.^(«]) (19) 

so that defining the error Cj := ttij—Uj and subtracting flT9l) from flTSl) we obtain the equation 

ej+i ^ (/ - Kj)Djej + KjTIj+i 

679 where Dj = D'^{uj). The stability of the filter will be governed by families of products of 

680 the form 

U';z',{{I-K^)D^), k = l,...,J. 

682 We observe that / — Kj will act to induce stability, as it has norm less than one in appro- 

683 priate spaces; Dj, however, will induce some instability whenever the dynamics themeslves 

684 contain unstable growing modes. The balance between these effects - stabilization through 

685 observation and instability in the dynamics - determines whether the overall algorithm is 

686 stable. 

687 The operator Kj weights the relative importance of the model and the observations, 

688 according to covariance information. Therefore, this weighting must effectively stabilize the 

689 growing directions in the dynamics. Note that increasing Cj - variance inflation - has the 

690 effect of moving Kj towards the identity, so the mathematical mechanism of controlling 

691 the instability mechanism by variance inflation is elucidated by the discussion above. In 

692 particular, when the assimilation is proceeding in a stable fashion, the modes in which 

693 growing directions have support typically overestimate the variance to induce this stability. 

694 In unstable cases, there are at least some times at which some modes in which growing 
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695 directions have support underestimate the variance, leading to instabihty of the filter. It is 

696 always the case that the onset of instability occurs when the distance from th e estimated 



697 mean to the truth persistently exceeds the estimated standard deviation. In 



Brett et al. 



698 (120101 ) we provide the mathematical details and rigorous proofs which underpin the preceding 

699 discussion. 

700 In the following, two observations concerning the size of the error are particularly in- 

701 structive. Firstly, using the distribution assumed on the rjj, the following lower bound on 

702 the error is immediatcl: 

703 E||ej+i|p > E\\Kjr]j+if = tr{KjTK*). (20) 

704 This implies that the average scale of the error of the filter, with respect to the truth, is set by 

705 the scale of the observation error, and shows that the choice of the covariance updates, and 

706 hence the Kalman gain Kj, will affect the exact size of the average error, on this scale. The 

707 second observation follows from considering the trivial "filter" obtained by setting Kj = I, 

708 which corresponds to simply setting rhj = yj so that all weight is placed on the observations. 

709 In this case the average error is equal to 

E||e,-+i||' = E||r/,-+i||' = tr(r). (21) 

711 As we would hope that incorporation of the model itself improves errors we view (12T]) as 

712 providing an upper bound on any reasonable filter and we will consider the filter "unstable" 

713 if the squared error from the truth exceeds tr(r). Thus we use (!2T|) and (!20|) as guiding 

714 upper and lower bounds when studying the errors in the filter means in what follows. 

715 In cases where our basic algorithm is unstable in the sense just defined we will also imple- 

716 ment a stabilized algorithm, by adopting the commonly used practice of variance inflation. 

717 The discussion above demonstrates how this acts to induce stability by causing the Kj to 

718 move closer to the identity. For 3DVAR this is achieved by taking the original Cq and re- 

719 defining it via the transformation Cq — )■ -Cq. In all the numerical computations presented 



*Here E denotes expectation with respect to the random variables 7]j. 
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720 in this paper which concern the stabihzed version of 3DVAR we take e = 0.01. The FDF(b) 

721 algorithm remains stable since it already has an inflated variance via the model error term. 

722 For LRExKF we achieve variance inflation by replacing the perturbation term of Equation 

723 [15] with (/ — VV*)Cj{I — VV*), where Cj is the covariance arising from FDF(b). Finally 

724 we discuss stabilization of the EnKF. This is achieved by taking the original Cj's given by 

725 (IT6l) and redefining them via the transformations Co — )■ -Cq, and Cj — )■ (1 + ei)Cj + EpCo 

726 with e = 10~^,ej = 0.1, ep = 0.01. The parameter e prevents initial divergence, e, main- 

727 tains stability with direct incremental inflation and Sp provides rank correction. This is only 

728 one option out of a wide array of possible hueristically derived such transformations. For 

729 example, rank correction is often performed by some form of localization which preserves 

730 trace and eliminates long-range correlations, while our rank correction preserves long-range 

731 correlations and provides trace inflation. The point here is that our transformation captures 

732 the essential mechanism of stabilization by inflation which, again, is our objective. 

733 We denote the stabilized versions of 3DVAR, LRExKF, and EnKF by [3DVAR] , [LRExKF] , 

734 and [EnKF] . Because FDF itself always remains stable we do not show results for a stabilized 

735 version of this algorithm. Note that we use ensembles in EnKF of equal size to the number 

736 of approximate eigenvectors in LRExKF, in order to ensure comparable work. This is always 

737 100, except for large h, when some of the largest 100 eigenvalues are too close to zero to 

738 maintain accuracy, and so fewer eigenvectors are used in LRExKF in these cases. Also, note 

739 again that we are looking for general features across methods and are not aiming to optimize 

740 the inflation procedure for any particular method. 

741 Examples of an unstable instance of 3DVAR and the corresponding stabilized filter, 

742 [3DVAR], are depicted in Figures [8] and |9l respectively, with u = 0.01, h = 0.2. In this 

743 regime the dynamics are strongly chaotic. The first point to note is that both simulations 

744 give rise to an error which exceeds the lower bound f l20|) : and that the unstable algorithm 

745 exceeds the desired bound (121 p . whilst the stabilized algorithm does not; note also that the 

746 stabilized algorithm output is plotted over a longer time-interval than the original algorithm. 
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747 A second noteworthy point relates to the power of using the dynamical model: this is manifest 

748 in the bottom right panels of each figure, in which the trajectory of a high wavenumber 

749 mode, close to the forcing frequency, is shown. The assimilation performs remarkably well 

750 for the trajectory of this wavenumber relative to the observations in the stabilized case, 

751 owing to the high weight on the dynamics and stability of the dynamical model for that 

752 wavenumber. Examples of an unstable instance of LRExKF and the corresponding stabilized 

753 filter, [LRExKF], are depicted in Figures [TOl and [TTl respectively, with v = 0.01, h = 0.5. 

754 The behaviour illustrated is very similar to that exhibited for 3DVAR and [3DVAR]. 

765 In the following tables we make a comparison between the original form of the filters and 

756 their stabilized forms, using the gold standard Bayesian posterior distribution as the desired 

757 outcome. Table H] shows data for h = 0.02 and 0.2 with T = 0.2 fixed. Tables [5] and |6] show 

758 data for h = 0.2 and 0.5 with T = 1 fixed. We focus our discussion on the approximation 

759 of the mean. It is noteworthy that, on the shorter time horizon T = 0.2, the stabilized 

760 algorithms are less accurate with respect to the mean than their original counterparts, for 

761 both values of observation time h; this reflects a lack of accuracy caused by inflating the 

762 variance. As would be expected, however, this behaviour is reversed on longer time-intervals, 

763 as is shown when T = 1.0, relfecting enhanced stability cased by inflating the variance. Table 

764 [5] shows the case T = 1.0 with h = 0.2, and the stabilized version of 3DVAR outperforms the 

765 original version, although the stabilized versions of EnKF and LRExKF are not as accurate 

766 as the original version. In Table |6l with h = 0.5 and T = 1.0, the stabilized versions improve 

767 upon the original algorithms in all three cases. Furthermore, in Table O we also display the 

768 FDF showing that, without any stabilization, this outperforms the other three filters and 

769 their stabilized counterparts. 
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770 6. Conclusion 

771 Incorporating noisy data into uncertain computational models presents a major challenge 

772 in many areas of the physical sciences, and in atmospheric modelling and NWP in particular. 

773 Data assimilation algorithms in NWP have had measurable positive impact on forecast skill. 

774 Nonetheless, assessing the ability of these algorithms to forecast uncertainty is more subtle. 

775 It is important to do so, however, especially as prediction is pushed to the limits of its 

776 validity in terms of time horizons considered, or physical processes modelled. In this paper 

777 we have proposed an approach to the evaluation of the ability of data assimilation algorithms 

778 to predict uncertainty. The cornerstone of our approach is to adopt a fully non-Gaussian 

779 Bayesian perspective in which the probability distribution of the system state over a time 

780 horizon, given data over that time horizon, plays a pivotal role: we contend that algorithms 

781 should be evaluated by their ability to reproduce this probability distribution, or important 

782 aspects of it, accurately. 

783 In order to make this perspective useful it is necessary to find a model problem which 

784 admits complex behaviour reminiscent of atmospheric dynamics, whilst being sufficiently 

785 small to allow computation of the Bayesian posterior distribution, so that data assimilation 

786 algorithms can be compared against it. Although MCMC sampling of the posterior can, in 

787 principle, recover any distribution, it becomes prohibitively expensive for multi-modal distri- 

788 butions, depending on the energy barriers between modes. However for unimodal problems, 

789 state-of-the-art sampling techniques allow fully resolved MCMC computations to be un- 

790 dertaken. We have found that the 2D Navier-Stokes equations provide a model for which 

791 the posterior distribution may be accurately sampled using MCMC, in regimes where the 

792 dynamics is stationary and where it is strongly chaotic. We have confined our attention 

793 to strong constraint models, and implemented a range of variational and filtering meth- 

794 ods, evaluating them by their ability to reproduce the Bayesian posterior distribution. The 

795 set-up is such that the Bayesian posterior is unimodal and approximately Gaussian. Thus 

796 the evaluation is undertaken by comparing the mean and covariance structure of the data 
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797 assimilation algorithms against the actual Bayesian posterior mean and covariance. Simi- 

798 lar studies wer e unde rtaken in the context of a subsurface geophysical inverse problem in 



799 iLiu and Oliverl (120031 ) . although the conclusions were less definitive. It would be interesting 

800 to revisit such subsurface geophysical inverse problems using the state-of-the-art MCMC 

801 techniques adopted here, in order to compute the posterior distribution. Moreover it would 

802 be interesting to conduct a study, similar to that undertaken here, for models of atmo- 

803 spheric dynamics such as Lorenz-96, or a quasi-geostrophic models, which admit baroclinic 

804 instabilities. 

805 These studies, under the assumption of a well-defined posterior probability distribution, 

806 lead to four conclusions: (i) most filtering and variational algorithms do a reasonably good 

807 job of reproducing the mean; (ii) for most of the filtering and variational algorithms studied 

808 and implemented here there are circumstances where the approximations underlying the ad 

809 hoc filters are not justified and they then fail to reproduce covariance information with any 

810 accuracy (iii) most filtering algorithms exhibit instability on longer time-intervals causing 

811 them to lose accuracy in even mean prediction; (iv) filter stabilization, via variance inflation 

812 of one sort or the other, ameliorates this instability and can improve long-term accuracy of 

813 the filters in predicting the mean, but can reduce the accuracy on short time intervals and 

814 of course makes it impossible to predict the covariance. In summary most data assimilation 
816 algorithms used in practice should be viewed with caution when using them to make claims 

816 concerning uncertainty although, when properly tuned, they will frequently track the signal 

817 mean accurately for fairly long time intervals. These conclusions are intrinsic to the algo- 

818 rithms, and result from the nature of the approximations made in order to create tractable 

819 online algorithms; the basic conclusions are not expected to change by use of different dy- 

820 namical models, or by modifying the parameters of those algorithms. 

821 Finally we note that we have not addressed in this paper the important but complicated 

822 issue of how to choose the prior distribution on the initial condition. We finish with some 

823 remarks concerning this. The "accuracy of the spread" of the prior is often monitored in 
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824 practice with a rank histogram lAndersonI (jl996l ). This can be computed even in the absence 

825 of an ensemble for any method in the class of those discussed here, by partitioning the 

826 real line in bins according to the assumed Gaussian prior density. It is important to note 

827 that uniform component-wise rank histograms in each direction guarantee that there are no 

828 directions in which the variance is consistently underestimated, and this should therefore be 

829 sufficient for stability. It is also necessary for the accurate appr o ximat ion of the Bayesian 



Hamill et al 



(12000). Indeed, one can 



Hamill et al. 
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830 posterior distribution, but by no means sufficient 

831 iteratively compute a constant prior with the cycled 3DVAR algorithm 

832 such that the estimator from the algorithm will have statistics consistent with the constant 

833 prior used in the algorithm. The estimator produced by this algorithm is guaranteed by 

834 construction to yield uniform rank histograms of the type described above, and yet the 

835 actual prior coming from the posterior at the previous time is not constant, so this cannot 

836 be a good approximation of the actual prior. See Fig. [7] for an image of the variance which is 

837 consistent with the statistics of the estimator over 100 iterations of 3DVAR with v = 0.01 and 

838 h = 0.5, as compared with the prior, posterior, and converged FDF variance at T = 1. Notice 

839 FDF overestimates in the high-variance directions, and underestimates in the low-variance 

840 directions (which correspond in our case to the unstable and stable directions, respectively). 

841 The RMSE of 3DVAR with constant converged FDF variance is smaller than with constant 

842 variance from converged statistics, and yet the former clearly will yield component-wise rank 

843 histograms which appear to always underestimate the "spread" in the low-variance, stable 

844 directions, and overestimate in the high-variance, unstable directions. It is also noteworthy 

845 that the FDF variance accurately recovers the decay of the posterior variance, but is about 

846 an order of magnitude larger. Further investigation of how to initialize statistical forecasting 

847 algorithms clearly remains a subject presenting many conceptual and practical challenges. 
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852 7. Appendix: Some numerical details 

853 Here we provide some details of the numerical algorithms underlying the computations 

854 which we present in the main body of the paper. First, we will describe the numerical 

855 methods used for the dynamical model. Secondly we study the adjoint solver. Thirdly we 

856 discuss various issues related to the resulting optimization problems and large linear systems 

857 encountered. Finally we discuss the MCMC method used to compute the gold standard 

858 posterior probability distribution. 

859 In the dynamical and observational models the forcing in Eq. [T]is taken to be / = V'^ip, 

860 where ip = cos{k ■ x) and V"*" = JV with J the canonical skew-symmetric matrix, and 

861 k = (1, 1) for stationary {u = 0.1) regime, while k = (5, 5) for the strongly chaotic regime in 

862 order to allow an upscale cascade of energy. Furthermore, we set the observational noise to 

863 white noise F = 7^/, where 7 = 0.04 is chosen as 10% of the maximum standard deviation 

864 of the strongly chaotic dynamics, and we choose an initial smoothness prior Co = A~'^, 

865 where A is the Stokes operator. We notice that only the observations on the unstable 

866 manifol d of the underlying solutio n map need to be assimilated. A similar observation was 



867 made in IChorin and Krausd fl2004l ) in the context of particle filters. Our choice of prior and 

868 observational covariance reflect this in the sense that the ratio of the prior to the observational 

869 covariance is larger for smaller wavenumbers (and greater than 1, in particular), in which the 

870 unstable manifold has support, while this ratio tends to zero as \k\ -^ 00. The initial mean, 

871 or background state, is chosen as rriQ ~ J^{u\Cq), where m^ is the true initial condition. 

872 In the case of strongly chaotic dynamics it is taken as an arbitrary point on the attractor 

873 obtained by simulating an arbitrary initial condition until statistical equilibrium. The initial 

874 condition for the case of stationary dynamics is taken as a draw from the Gaussian prior. 



since the statistical equilibrium is the trivial one. 

Our numerical method for the dynamical model is based on a Galerkin approximation of 
the velocity field in a divergence -free Fourier basis. We use a modification of a fourth-order 



Runge-Kutta method, ETD4RK 



Cox and Matthewsl (120021 ). in which the heat semi-group is 



used together wit h Duhamers prin c iple t o solve exactly for the diffusion term. A spectral 



Galerkin method 



Hesthaven et al 



(120071 ) is used in which the convolutions arising from 



products in the nonlinear term are computed via FFTs. We use a double-sized domain in each 
dimension, buffered with zeros, resulting in 64^ grid-point FFTs, and only half the modes are 
retained when transforming back into spectral space in order to prevent de-aliasing which 
is avoided as long as fewer than 2/3 the modes are retained. Data assimilation in practice 
always contends with poor spatial resolution, particularly in the case of the atmosphere 
in which there are many billions of degrees of freedom. For us the important resolution 
consideration is that the unstable modes, which usually have long spatial scales and support 
in low wave-numbers, are resolved. Therefore, our objective here is not to obtain high spatial- 
resolution but rather to obtain high temporal-resolution in the sense of reproducibility. We 
would like the divergence of two close-by trajectories to be dictated by instability in the 
dynamical model rather than the numerical time-stepping scheme. 

It is also important that we have accurate adjoint solvers, and this is strongly linked 
to the accuracy of the forward solver. The same time-stepper is used to solve the adjoint 
equation, with twice the time-step of the forward solve, since the forward solution is re- 
quired at half-steps in order to implement this method for the non-autonomou s adjoint solve . 



896 Many issues can arise in the implementation of adjoint, or costate methods 



Banksl (Il992[ ) 



Vogel and Wadd ( 119951 ) and the practitioner should be aware of these. The easiest way to 
ensure convergence is to test that the tangent linearized map is indeed the linearization of 
the solution map and then confirm that the adjoint is the adjoint to a suitable threshold. We 
have taken the approach of "optimize then discretize" here, and as such our adjoint model 
is the discretization of the analytical adjoint. This effect becomes apparent in the accuracy 
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of the linearization for longer time intervals, and we are no longer able to compute accurate 
gradients and Hessians as a result. 

Regarding linear algebra and optimization issues we make the following observations. A 
Krylov method (GMRES) is used for linear solves in the Newton method for 4DVAR, and 
the Arnoldi method is used for low-rank covariance approximations in LRExKF and for 
the filtering time T covariance approximation in 4DVAR. The LRExKF always sufficiently 
captures more than 99% of the full rank version as measured in Frobenius (matrix P) norm. 
The initial Hessian in 4DVAR and well as the ones occuring within Newton's method are 
computed by finite difference. Using a gradient flow (preconditioned steepest descent) com- 
putation, we obtain an approximate minimizer close to the actual minim izer and then a 



912 preconditioned Newton-Krylov nonlinear flxed-point solver is used (nsoli 



his appro ach is akin to the Levenburgh-Marquardt algorith m. See 



914 fll997h and 



KellevI (120031)'). 



Trefethen and Bau 



Nocedal and Wright 



fll999h 



SaadI (119961 ) for overviews of the linear algebra and 
for an overview of optimization. Strong constraint 4DVAR can be computationally challeng- 
ing and, although we do not do so her e, it wou l d be interesting to study weak constraint 



917 4DVAR from a related perspective; see 



Brockerl (120 lOl ) for a discussion of weak constraint 



918 4DVAR in continuous time. It is useful to employ benchmarks in order to con firm gradients 



919 are be ing computed properly when implementing optimizers, see for example 



Lawless et al. 



(120031). 

Finally, we comment on the MCMC computations which, of all the algorithms imple- 
mented here, lead to the highest computational cost. This, of course, is because it fully 
resolves the posterior distribution of interest whereas the other algorithms use crude ap- 
proximations, the consequences of which we study by comparison with accurate MCMC 
results. Each time-step requires 4 function evaluations, and each function evaluation re- 
quires 8 FFTs, so it costs 32 FFTs for each time-step. We fix the lengths of paths at 40 
time-steps for most of the computations, but nonetheless, this is on the order of 1000 FFTs 
per evaluation of the dynamical model. If a 64^ FFT takes 1 ms, then this amounts to 1 s 
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929 per sample. Clearly this is a hurdle as it would take on the order of 10 days to obtain on the 

930 order of millions of samples in serial. We overcome this by using the MAP estimator (4DVAR 

931 solution) as the initial condition in order to accellerate burn-in, and then run independent 

932 batches of 10^ samples in parallel with independent seeds in the random number generator. 

933 We also minimize comp utional effort within the method by employing the technique of early 



934 rejection introduced by iHaariol (l2010l ) which means that rejection can be detected before the 

935 forward computation required for evaluation of $ reaches the end of the assimilation time 

936 window; the computation can then be stopped and hence computational savings made. 

937 It is important to recognize that we cannot rely too heavily on results of MCMC with 

938 smaller relative norm than 10"'^ for the mean or 10^^ for the variance, because we are 

939 bound to 0{N^^/'^) convergence and it is already prohibitively expensive to get several 

940 million samples. More than 1 0^ is not tractable. Convergence is measured by a version of 



941 MSPRF lBrooks and GelmanI (jl998l ). evi,^ = ||var[Mi(t)] — var[M8(t)]||/||var[Mi(t)]||, where Ui 

942 corresponds to sample statistics with 1 chain and ug corresponds to sample statistics over 

943 8 chains. We find ewi:8 = (9(10~^) for A^ = 3.2 x 10^ samples in each chain. If we define 

944 emi:8 = ||E[Mi(t)] -E[us{t)]\\/\\E[ui{t)]\\, then we have enii-.g = O{10~^). 



REFERENCES 

947 Anderson, J., 1996: A method for producing and evaluating probabilistic forecasts from 

948 ensemble model integrations. Journal of Climate, 9 (7), 1518-1530. 

949 Apte, A., M. Hairer, A. Stuart, and J. Voss, 2007: Sampling the posterior: an approach to 
960 non-Gaussian data assimilation. Physica D, 230, 50-64. 

951 Apte, A., C. Jones, and A. Stuart, 2008a: A Bayesian approach to Lagrangian data assimi- 



lation. Tellus, 60, 336-347. 



41 



953 Apte, A., C Jones, A. Stuart, and J. Voss, 2008b: Data assimilation: Mathematical and 

954 statistical perspectives. International Journal for Numerical Methods in Fluids, 56 (8), 
965 1033-1046. 

956 Arulampalam, M., S. Maskell, N. Gordon, and T. Clapp, 2002: A tutorial on particle filters 

957 for online nonlinear/non-gaussian bayesian tracking. Signal Processing, IEEE Transactions 

958 on, 50 (2), 174-188. 

959 Auvinen, H., J. Bardsley, H. Haario, and T. Kauranne, 2009: Large-scale kalman filtering 

960 using the limited memory bfgs method. Electronic Transactions on Numerical Analysis, 

961 35, 217-233. 

962 Bain, A. and D. Cri§an, 2008: Fundamentals of stochastic filtering. Springer Verlag. 

963 Banks, H., 1992: Computational issues in parameter estimation and feedback control prob- 

964 lems for partial differential equation systems. Physica D: Nonlinear Phenomena, 60 (1-4), 

965 226-238. 

966 Banks, H. and K. Kunisch, 1989: Estimation techniques for distributed parameter systems. 

967 Birkhauser Boston. 

968 Bengtsson, T., C Snyder, and D. Nychka, 2003: Toward a nonlinear ensemble filter for 

969 high- dimensional systems. J. Geophys. Res, 108 (D24), 8775. 

970 Bennett, A., 2002: Inverse Modeling of the ocean and Atmosphere. Cambridge. 

971 Brett, C, A. Lam, K. Law, D. McCormick, M. Scott, and A. Stuart, 2010: Stability of filters 

972 for the navier-stokes equation, preprint. 

973 Brocker, J., 2010: On variational data assimilation in continuous time. Quarterly Journal of 

974 the Royal Meteorological Society, 136 (652), 1906-1919. 

975 Brooks, S. and A. Gelman, 1998: General methods for monitoring convergence of iterative 

976 simulations. Journal of Computational and Graphical Statistics, 7 (4), 434-455. 

977 Bryson, A. and M. Frazier, 1963: Smoothing for linear and nonlinear dynamic systems. 

978 Proceedings Optimum System Sythensis Conference, US Air Force Tech. Rep. AFB-TDR- 

979 63-119. 

42 



980 Carrassi, A., M. Ghil, A. Trevisan, and F. Uboldi, 2008: Data assimilation as a nonlin- 

981 ear dynamical systems problem: Stability and convergence of the prediction-assimilation 

982 system. Chaos: An Interdisciplinary Journal of Nonlinear Science, 18, 023112. 

983 Chorin, A. and P. Krause, 2004: Dimensional reduction for a bayesian filter. Proceedings of 

984 the National Academy of Sciences of the United States of America, 101 (42), 15 013. 

985 Chorin, A., M. Morzfeld, and X. Tu, 2010: limplicit particle filters for data assimilation. 

986 Communications in Applied Mathematics and Computational Science, 221. 

987 Cotter, S., M. Dashti, J. Robinson, and A. Stuart, 2009: Bayesian inverse problems for 

988 functions and applications to fluid mechanics. Inverse Problems, 25, 115 008. 

989 Cotter, S., M. Dashti, and A. Stuart, 2011: Variational data assimilation using targetted 

990 random walks. Int. J. Num. Meth. Fluids. 

991 Courtier, P. and O. Talagrand, 1987: Variational assimilation of meteorological observations 

992 with the adjoint vorticity equation, ii: Numerical results. Quarterly Journal of the Royal 

993 Meteorological Society, 113 (478), 1329-1347. 

994 Cox, H., 1964: On the estimation of state variables and parameters for noisy dynamic 

995 systems. Automatic Control, IEEE Transactions on, 9 (1), 5-12. 

996 Cox, S. and P. Matthews, 2002: Exponential time differencing for stiff systems. Journal of 

997 Computational Physics, 176 (2), 430-455. 

998 Doucet, A., N. De Freitas, and N. Gordon, 2001: Sequential Monte Carlo methods in practice. 

999 Springer Verlag. 

1000 Evensen, C, 2003: The ensemble kalman filter: Theoretical formulation and practical im- 

1001 plementation. Ocean dynamics, 53 (4), 343-367. 

1002 Evensen, C, 2009: Data assimilation: the ensemble Kalman filter. Springer Verlag. 

1003 Evensen, C, P. Van Leeuwen, et al., 1994: Assimilation of geosat altimeter data for the 

1004 agulhas current using the ensemble kalman filter with a quasi-geostrophic model. Monthly 

1005 Weather. 

1006 Fisher, M., M. Leutbecher, and G. Kelly, 2005: On the equivalence between kalman smooth- 

43 



1007 ing and weak-constraint four- dimensional variational data assimilation. Quarterly Journal 
looB of the Royal Meteorological Society, 131 (613), 3235-3246. 

1009 Haario, H., 2010: Early rejection in metropolis-hastings. Private communication. 

1010 Hamill, T., C Snyder, and R. Morss, 2000: A comparison of probabilistic forecasts from bred, 

1011 singular- vector, and perturbed observation ensembles. Monthly Weather Review, 128 (6), 

1012 1835-1851. 

1013 Harlim, J. and A. Majda, 2008: Filtering nonlinear dynamical systems with linear stochastic 

1014 models. Nonlinearity, 21, 1281. 

1015 Harvey, A., 1991: Forecasting, structural time series models and the Kalman filter. Cam- 

1016 bridge Univ Pr. 

1017 Hesthaven, J., S. Gottlieb, and D. Gottlieb, 2007: Spectral methods for time- dependent 
loiB problems. Vol. 21. Cambridge Univ Pr. 

1019 Hinze, M., R. Pinnau, M. Ulbrich, and S. Ulbrich, 2008: Optimization with PDF constraints. 

1020 Springer Verlag. 

1021 Jazwinski, A., 1970: Stochastic processes and filtering theory. Academic Pr. 

1022 Kaipio, J. and E. Somersalo, 2005: Statistical and computational inverse problems. Springer 

1023 Science-I- Business Media, Inc. 

1024 Kalman, R., 1960: A new approach to linear filtering and prediction problems. Journal of 

1025 basic Engineering, 82 (Series D), 35-45. 

1026 Kalnay, E., 2003: Atmospheric modeling, data assimilation, and predictability. Cambridge 

1027 Univ Pr. 

102B Kelley, C, 2003: Solving nonlinear equations with Newton's method. Vol. 1. Society for 

1029 Industrial Mathematics. 

1030 Lawless, A., N. Nichols, and S. Ballard, 2003: A comparison of two methods for developing 

1031 the linearization of a shallow-water model. Quarterly Journal of the Royal Meteorological 

1032 Society, 129 (589), 1237-1254. 

1033 Lei, J., P. Bickel, and C. Snyder, 2010: Comparison of ensemble kalman filters under non- 

44 



1034 gaussianity. Monthly Weather Review, 138 (4), 1293-1306. 

1035 Leutbecher, M., 2003: Adaptive observations, the hessian metric and singular vectors. Proc. 

1036 ECMWF Seminar on Recent developments in data assimilation for atmosphere and ocean, 

1037 Reading, UK, 8-12. 

1038 Liu, N. and D. S. Oliver, 2003: Evaluation of monte carlo methods for assessing uncertainty. 

1039 SPE Journal, 8 (2), 188-195. 

1040 Lorenc, A., 1986: Analysis methods for numerical weather prediction. Quarterly Journal of 

1041 the Royal Meteorological Society, 112 (474), 1177-1194. 

1042 Lorenz, E., 1963: Deterministic nonperiodic flowl. Atmos J Sci, 20, 130-141. 

1043 Lorenz, E., 1996: Predictability: A problem partly solved. Proc. Seminar on Predictability, 

1044 Vol. 1, 1-18. 

1045 Majda, A., J. Harlim, and B. Gershgorin, 2010: Mathematical strategies for filtering turbu- 

lent dynamical systems. DYNAMICAL SYSTEMS, 27 (2), 441-486. 

1047 Meng, Z. and F. Zhang, 2010: Tests of an ensemble kalman filter for mesoscale and regional- 

1048 scale data assimilation, part iv: Comparison with 3dvar in a month- long experiment. 

1049 Miller, R., M. Ghil, and F. Gauthiez, 1994: Advanced data assimilation in strongly nonlinear 

1050 dynamical systems. Journal of the Atmospheric Sciences, 51 (8), 1037-1037. 

1051 Nocedal, J. and S. Wright, 1999: Numerical optimization. Springer verlag. 

1052 Palmer, T., R. Gelaro, J. Barkmeijer, and R. Buizza, 1998: Singular vectors, metrics, and 

1053 adaptive observations. Journal of the atmospheric sciences, 55 (4), 633-653. 

1054 Quinn, J. and H. Abarbanel, 2010: State and parameter estimation using monte carlo eval- 

1055 nation of path integrals. Quarterly Journal of the Royal Meteorological Society. 

1056 Saad, Y., 1996: Iterative methods for sparse linear systems. PWS Pub. Co. 

1057 Snyder, T., T. Bengtsson, P. Bickel, and J. Anderson, 2008: Obstacles to high-dimensional 

1058 particle filtering. Monthly Weather Review., 136, 4629-4640. 

1059 Stuart, A., 2010: Inverse problems: a Bayesian perspective. Acta Numerica, 19 (-1), 451- 



45 



1060 559. 

1061 Talagrand, O. and P. Courtier, 1987: Variational assimilation of meteorological observations 

1062 with the adjoint vorticity equation, i: Theory. Quarterly Journal of the Royal Meteorolog- 

1063 teal Society, 113 (478), 1311-1328. 

1064 Tarantola, A., 2005: Inverse problem theory and methods for model parameter estimation. 

1065 Society for Industrial Mathematics. 

1066 Temam, R., 2001: Navier-Stokes equations: theory and numerical analysis. Amer Mathe- 

1067 matical Society. 

1068 Tippett, M., J. Anderson, C Bishop, T. Hamill, and J. Whitaker, 2003: Ensemble square 

1069 root filters. Monthly weather review, 131 (7), 1485-1490. 

1070 Toth, Z. and E. Kalnay, 1997: Ensemble forecasting at ncep and the breeding method. 

1071 Monthly Weather Review, 125, 3297. 

1072 Trefethen, L. and D. Bau, 1997: Numerical linear algebra. 50, Society for Industrial Mathe- 

1073 matics. 

1074 Van Leeuwen, P., 2009: Particle filtering in geophysical systems. Monthly Weather Review, 

1075 137, 4089-4114. 

1076 van Leeuwen, P., 2010: Nonlinear data assimilation in geosciences: an extremely efficient 

1077 particle filter. Quarterly Journal of the Royal Meteorological Society, 136 (653), 1991- 

107B 1999. 

1079 Vogel, C, 2002: Computational methods for inverse problems. Society for Industrial Mathe- 

1080 matics. 

1081 Vogel, C. and J. Wade, 1995: Analysis of costate discretizations in parameter estimation for 

1082 linear evolution equations. SIAM journal on control and optimization, 33 (1), 227-254. 

1083 Zhang, M. and F. Zhang, 2012: E4dvar: Coupling an ensemble kalman filter with four- 

1084 dimensional variational data assimilation in a limited-area weather prediction model. 

1085 Monthly Weather Review- Boston, 140 (2), 587. 

1086 Zhang, M., F. Zhang, X. Huang, and X. Zhang, 2010: Inter-comparison of an ensemble 

46 



1087 kalman filter with three-and four-dimensional variational data assimilation methods in a 

1088 limited-area model over the month of June 2003. Monthly Weather Review. 

1089 Zupanski, D., 1997: A general weak constraint applicable to operational 4dvar data assimi- 

1090 lation systems. Monthly Weather Review, 125, 2274-2292. 



47 



List of Tables 



Stationary state regime, u = 0.1, T = 2, with h = 0.2 (top table), h = 1 (mid- 
dle), and h = 2 (bottom). The first, third, fourth and fifth columns are the 
norm difference, e=||M — m||/||M||, where M is the mean of the posterior 
distribution (MCMC), the truth, the observation, or the MAP estimator and 
m is the mean obtained from the various methods. The second column is the 
norm difference, e = ||var[n] — var[f/]||/||var[u]|| where var indicates the vari- 
ance, u is sampled from the posterior (via MCMC), and U is the approximate 

state obtained from the various methods |5Q 

Same as Tabled! except for strongly chaotic regime with u = 0.01, T = 0.2, 

and h = 0.02 (top), 0.1 (middle) and 0.2 (bottom) Isi 

Same as Table |2l except T = 1, and h = 0.2 (top) and h = 0.5 (bottom). 
The variance is ommitted from the 4DVAR solutions here, because we are 
unable to attain solution with zero derivative. We must note here that we 
have taken the approach of differentiating and then discretizing. Therefore, 
over longer time intervals such as this, the error between the discretization of 
the analytical derivative and derivative of the finite-dimensional discretized 
forward map accumulates and the derivative of the objective function is no 
longer well-defined because of this error. Nonetheless, we confirm that we do 
obtain the MAP estimator because the MCMC run does not yield any point 

of higher probability. |52 

The data of unstable algorithms from Table E] (z/ = 0.01, T = 0.2) are re- 
produced above (with h = 0.02(top) and h = 0.2(bottom)), along with the 
respective stabilized versions in brackets. Here the stabilized versions usually 
perform worse. Note that over longer time scales, the unstabilized version will 
diverge from the truth, while the stabilized one remains close |53 
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Same as Table HI except T = 5h = 1 and h = 0.2.[3DVAR] performs better 

with respect to the mean l54 

Same as Tabled except h = 0.5. All stabilized algorithms now perform better 
with respect to the mean. [LRExKF] above uses 50 eigenvectors in the low 
rank representation, and performs worse for larger number, indicating that 
the improvement is due largely to the FDF component. The stable PDF 
data are included here as well, since FDF is now competitive as the optimal 
algorithm in terms of mean estimator. This is expected to persist for larger 
time windows and lower frequency observations, since the LRExKF is outside 
of the regime of validity, as shown in Figure IH |55 
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FDF 
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0.00319756 


0.106976 


0.0385305 


0.688464 


0.00312047 


truth (t = 0) 


0.32043 
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0.771936 


0.299957 


truth (t = T) 


0.03871 
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0.688664 


0.038578 



Table 1. Stationary state regime, v = 0.1, T = 2, with h = 0.2 (top table), h = 1 
(middle), and h = 2 (bottom). The first, third, fourth and fifth columns are the norm 
difference, e = ||M — m||/||M||, where M is the mean of the posterior distribution (MCMC), 
the truth, the observation, or the MAP estimator and m is the mean obtained from the 
various methods. The second column is the norm difference, e = ||var[M] — var[t/]||/||var[M]|| 
where var indicates the variance, u is sampled from the posterior (via MCMC), and U is the 
approximate state obtained from the various methods. 
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0.045048 


0.042431 


0.322306 
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0.063289 
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0.0634026 


FDF 
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LRExKF 


0.00599214 


0.030054 
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0.0354624 


truth (t = 0) 
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LRExKF 
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0.174878 
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0.358301 
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truth (t = 0) 
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truth (t = T) 


0.0698665 


- 





0.368335 


0.0705546 


/i = 0.2 


^mean 


(^variance 


^truth 


^ohs 


^m.ap 


MCMC(t = 0) 








0.0459125 


0.293686 


0.00122936 


4DVAR(t = 0) 


0.00183617 


0.0231955 


0.0462013 


0.281137 





MCMC(t = T) 








0.072738 


0.352456 


0.00385795 


4DVAR(t = T) 


0.00386162 


0.0196227 


0.0723178 


0.352145 





3DVAR 


0.285461 


1.72154 


0.300853 
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EnKF 
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0.352625 
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truth (t = 0) 
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0.0496251 


truth {t = T) 


0.072738 
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0.368331 


0.0720492 



Table 2. Same as Tabled! except for strongly chaotic regime with v 
h = 0.02 (top), 0.1 (middle) and 0.2 (bottom). 



0.01, T = 0.2, and 
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h = 0.2 
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0.0322397 
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truth (t = 0) 
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0.00831886 
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0.458527 
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0.487658 


0.460144 


FDF 


0.189832 
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0.19999 


0.25111 


0.191364 


LRExKF 


0.644427 


0.325391 


0.650004 


1.22145 


0.646233 


EnKF 


0.901703 


0.554611 


0.895878 


0.908817 


0.902438 


truth (t = 0) 


0.0318531 


- 





0.303185 


0.0269929 


truth (t = T) 


0.0460821 


- 





0.294524 


0.0448046 



Table 3. Same as Table[2l except T = 1, and h = 0.2 (top) and h = 0.5 (bottom). The vari- 
ance is ommitted from the 4DVAR solutions here, because we are unable to attain solution 
with zero derivative. We must note here that we have taken the approach of differentiating 
and then discretizing. Therefore, over longer time intervals such as this, the error between 
the discretization of the analytical derivative and derivative of the finite-dimensional dis- 
cretized forward map accumulates and the derivative of the objective function is no longer 
well-defined because of this error. Nonetheless, we confirm that we do obtain the MAP 
estimator because the MCMC run does not yield any point of higher probability. 



52 



h=0.02 


(^mean 


(^variance 


^truth 


(^obs 


(^map 


3DVAR 


0.0634553 


6.34057 


0.063289 


0.321959 


0.0634026 


[3DVAR] 


0.142759 


22.2668 


0.153141 


0.309838 


0.143005 


EnKF 


0.035271 


0.274428 


0.0523566 


0.323074 


0.0354624 


[EnKF 


0.167776 


28.1196 


0.175359 


0.304352 


0.167919 


h=0.2 


(^mean 


(^variance 


^truth 


^obs 


^map 


3DVAR 


0.285461 


1.72154 


0.300853 


0.38443 


0.286161 


[3DVAR] 


0.195222 


6.33608 


0.204883 


0.339108 


0.196339 


LRExKF 


0.0750908 


0.0547417 


0.0886932 


0.35073 


0.0726792 


[LRExKF] 


0.156973 


7.64123 


0.169354 


0.310298 


0.156596 


EnKF 


0.137844 


0.372259 


0.159744 


0.353934 


0.137969 


[EnKF 


0.248081 


6.34903 


0.267746 


0.368067 


0.249475 



Table 4. The data of unstable algorithms from Table |2](i/ = 0.01, T = 0.2) are reproduced 
above (with h = 0.02(top) and h = 0.2(bottom)), along with the respective stabilized 
versions in brackets. Here the stabilized versions usually perform worse. Note that over 
longer time scales, the unstabilized version will diverge from the truth, while the stabilized 
one remains close. 
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h=0.2 


^rnean 


(^variance 


dtruth 


Cobs 


(^map 


3DVAR 


0.35571 


3.17803 


0.357351 


0.419614 


0.35557 


[3DVAR] 


0.131964 


11.5997 


0.135572 


0.277895 


0.133265 


LRExKF 


0.101179 


0.28308 


0.0900697 


0.291704 


0.101287 


[LRExKF] 


0.12962 


16.3692 


0.13592 


0.256617 


0.129742 


EnKF 


0.0736613 


0.276947 


0.0755232 


0.282247 


0.0742144 


[EiiKF] 


0.1231 


14.8557 


0.133171 


0.261061 


0.124203 



Table 5. Same as Table HI except T 
respect to the mean. 



5/i = 1 and h = 0.2.[3DVAR] performs better with 
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h=0.5 
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3DVAR 


0.458527 


1.8214 


0.45353 


0.487658 


0.460144 


[3DVAR] 


0.27185 
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0.285351 


0.307263 


0.274663 


LRExKF 


0.644427 


0.325391 
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1.22145 


0.646233 


[LRExKF] 


0.201327 


11.2449 


0.207526 


0.244101 


0.201081 


EnKF 


0.901703 


0.554611 


0.895878 


0.908817 


0.902438 


[EiiKF] 


0.169262 


4.07238 


0.17874 


0.244571 


0.170245 


FDF 


0.189832 


11.4573 


0.19999 


0.25111 


0.191364 



Table 6. Same as Tabled except h = 0.5. All stabilized algorithms now perform better with 
respect to the mean. [LRExKF] above uses 50 eigenvectors in the low rank representation, 
and performs worse for larger number, indicating that the improvement is due largely to 
the FDF component. The stable FDF data are included here as well, since FDF is now 
competitive as the optimal algorithm in terms of mean estimator. This is expected to persist 
for larger time windows and lower frequency observations, since the LRExKF is outside of 
the regime of validity, as shown in Figure ID 
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List of Figures 



Low Reynolds number, stationary solution regime {u = 0.1). The vorticity, 
w{0) (left) of the smoothing distribution at t = 0, and its Fourier coefficients 
(right), are presented for T = lOh = 2. The top and bottom rows are the 
MCMC sample mean and the truth. The MAP estimator is not distinguish- 
able from the mean by eye and so is not displayed. The prior mean is taken 
as a draw from the prior, and hence is not as smooth as the initial condition. 
It is the influence of the prior which makes the MAP estimator and mean 
rough, although structurally the same as the truth (the solution operator is 
smoothing, so these fluctuations are immediately smoothed out - see Fig. [2]). |59 
Low Reynolds number, stationary solution regime {u = 0.1). The vorticity, 
w(T) (left) of the filtering distribution at t = T, and its Fourier coefficients 
(right), are presented for T = lOh = 2. Only the MCMC sample mean is 
shown, since the solutions have been smoothed out and the difference between 

the MAP, mean, and truth is imperceptible l60 

The MCMC histogram at t = (left) and t = T = lOh = 2 (right) together 
with the Gaussian approximation obtained from 4DVAR for low Reynolds 
number, stationary state regime (z/ = 0.1) l61 
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The left panel is the average velocity spectrum on the attractor for v = 0.01. 
The right panel shows the difference between (a) and (b) where: (a) is the 
difference of the truth u''{t) with a solution UT-{t) initially perturbed in the di- 
rection of the dominant local Lyapunov vectors tv, on time-interval of length 
r, with r = 0.02, 0.2, and 0.5 (thus ^,-(0) = n"''(0) + evr); and (b) is the evolu- 
tion of that perturbation under the linearized model Ur{t) = D'^{u'^{0); t)evr- 
The magnitude of perturbation e is determined by the projection of the initial 
posterior covariance in the direction Vt-. The difference plotted thus indicates 
differences between linear and nonlinear evolution with the the direction of 
the initial perturbations chosen to maximize growth and with size of the initial 
perturbations commensurate with the prevalent uncertainty. The relative er- 
ror |[M^(r)-Mt(r)]-f/^(r)|/|t/^(r)| (in Z^) is 0.01, 0.15, and 0.42, respectively, 

for the three chosen values of increasing r 

The MCMC mean, as in Fig. [1] for high Reynolds number, strongly chaotic 
solution regime, p = 0.01, T = lOh = 0.2, t = (top) and t = T(bottom). . . 
Same as Fig. [3l except for strongly chaotic regime, u = 0.01, T = 0.2, and 

h = 0.02. The top is mode Ui^i and the bottom shows mode ^5,5 

The left and right panels, repsectively, show the posterior and prior of the 
covariance from converged innovation statistics from the cycled 3DVAR algo- 
rithm, in comparison to the converged covariance from the FDF algorithm, 

and the posterior distribution 

Example of an unstable trajectory for 3DVAR with u = 0.01, h = 0.2. The top 
left plot shows the norm-squared error between the estimated mean, m{tn) = 
rhn, and the truth, u'^itn), in comparison to the preferred upper bound (i.e. 
the total observation error tr(r), fl2T]) ) and the lower bound tT[KnTK*] fl20|) . 
The other three plots show the estimator, m{t), together with the truth, u^(t), 
and the observations, y„ for a few individual modes 
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9 Example of a variance-inflated stablilized trajectory (Cq — )■ -Co) for [3DVAR] 
with the same external parameters as in Fig. [HI Panels are the same as in 

Fig. E ml 

10 Example of an unstable trajectory for LRExKF with z/ = 0.01, h = 0.5. Panels 

are the same as in Fig. |H1 l68 

11 Example of a variance-inflated stablilized trajectory (updated with model 
b from Section |2] on the complement of the low-rank approximation) for 
[LRExKF] with the same external parameters as in Fig. [TOl Panels are 

the same as in Fig. [TOl |69 
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Fig. 1. Low Reynolds number, stationary solution regime (z/ = 0.1). The vorticity, w{0) 
(left) of the smoothing distribution at t = 0, and its Fourier coefficients (right), are presented 
for T = lOh = 2. The top and bottom rows are the MCMC sample mean and the truth. 
The MAP estimator is not distinguishable from the mean by eye and so is not displayed. 
The prior mean is taken as a draw from the prior, and hence is not as smooth as the initial 
condition. It is the influence of the prior which makes the MAP estimator and mean rough, 
although structurally the same as the truth (the solution operator is smoothing, so these 
fluctuations are immediately smoothed out - see Fig. [2]). 
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Fig. 2. Low Reynolds number, stationary solution regime {u = 0.1). The vorticity, tf (T) 
(left) of the filtering distribution at t = T, and its Fourier coefficients (right), are presented 
for T = lOh = 2. Only the MCMC sample mean is shown, since the solutions have been 
smoothed out and the difference between the MAP, mean, and truth is imperceptible. 
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Fig. 3. The MCMC histogram at t = (left) and t = T = 10/i = 2 (right) together with the 
Gaussian approximation obtained from 4DVAR for low Reynolds number, stationary state 
regime (z/ = 0.1). 
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Fig. 4. The left panel is the average velocity spectrum on the attractor for v = 0.01. 
The right panel shows the difference between (a) and (b) where: (a) is the difference of 
the truth u^(t) with a solution Ur{t) initially perturbed in the direction of the dominant 
local Lyapunov vectors Vr, on time-interval of length r, with r = 0.02, 0.2, and 0.5 (thus 
Ur{0) = u^{0)+evT)] and (b) is the evolution of that perturbation under the linearized model 
Urit) = D'^{u^{0);t)eVr. The magnitude of perturbation e is determined by the projection 
of the initial posterior covariance in the direction Vr- The difference plotted thus indicates 
differences between linear and nonlinear evolution with the the direction of the initial pertur- 
bations chosen to maximize growth and with size of the initial perturbations commensurate 



with the prevalent uncertainty. The relative error |[Mr(T) — u\t)] — Ur{T)\/\Ur{T) 
0.01, 0.15, and 0.42, respectively, for the three chosen values of increasing r. 
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Fig. 5. The MCMC mean, as in Fig. [T]for high Reynolds number, strongly chaotic solution 
regime, i^ = 0.01, T = lOh = 0.2, t = (top) and t = T(bottom). 
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Fig. 6. Same as Fig. [3l except for strongly chaotic regime, v = 0.01, T = 0.2, and h = 0.02. 
The top is mode ui^i and the bottom shows mode ^5,5. 
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Fig. 7. The left and right panels, repsectively, show the posterior and prior of the covariance 
from converged innovation statistics from the cycled 3DVAR algorithm, in comparison to 
the converged covariance from the FDF algorithm, and the posterior distribution. 
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3DVAR,v=0.01, h=0.2 
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3DVAR,v=0.01, h=0.2, Re(u^^) 




Fig. 8. Example of an unstable trajectory for 3DVAR with u = 0.01, h = 0.2. The top left 
plot shows the norm-squared error between the estimated mean, m{tn) = rhn-, and the truth, 
u^{tn)i in comparison to the preferred upper bound (i.e. the total observation error tr(r), 
fl2T]) ) and the lower bound ti[KnTK^ fl20l) . The other three plots show the estimator, m{t), 
together with the truth, M^(t), and the observations, ?/„ for a few individual modes. 
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Fig. 9. Example of a variance-inflated stablilized trajectory (Cq — j- -Cq) for [3DVAR] with 
the same external parameters as in Fig. [SI Panels are the same as in Fig. |H1 
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Fig. 10. Example of an unstable trajectory for LRExKF with u = 0.01, h = 0.5. Panels are 
the same as in Fig. [HI 
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Fig. 11. Example of a variance-inflated stablilized trajectory (updated with model b from 
Section [2] on the complement of the low-rank approximation) for [LRExKF] with the same 
external parameters as in Fig. [101 Panels are the same as in Fig. [TOl 
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